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Applicants: Hagen, et al. 

Serial No. : National Stage Filing of PCT/EP98/082 1 6 
Filing date: Herewith 

Title: Regulatory DNA Sequences of the Human Catalytic Telomerase Sub- 
Unit Gene, Diagnostic Therapeutic Use Thereof 



PRELIMINARY AMENDMENT 

Box PCT 

Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

This Preliminary Amendment is submitted in the above-captioned national 
stage application of PCT/EP98/08216 filed on even date herewith. Please amend the 
application as follows: 

In the Claims 

Please cancel claim 7. 

Please amend claims 4, 6 and 8-12 as follows: 

4. (Amended) Recombinant construct which contains a DNA sequence 
according to [one of] Claim[s] 1 [to 3]. 

6. (Amended) Vector which contains a recombinant construct according to 
Claim 4 [or 5]. 



8. (Amended) Recombinant host cells which harbour recombinant constructs 
or vectors according to [one of] Claim[s] 4 [to 6]. 
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9. (Amended) Process for identifying substances which affect the promoter 
activity, silencer activity or enhancer activity of the human catalytic 
telomerase subunit, comprising the following steps: 

A. adding a candidate substance to a host cell which harbours DNA 
sequences according to [one of] Claim [s] 1 [to 3] which sequences are 
functionally linked to a reporter gene, and 

B. measuring the effect of the substance on expression of the reporter gene. 

10. (Amended) Process for identifying factors which bind specifically to the 
DNA according to [one of] Claim[s] 1 [to 3], or to fragments thereof, 
characterized in that an expression cDNA library is screened using a DNA 
sequence according to [one of] Claim[s] 1 [to 3], or sub-fragments of widely 
differing length, as the probe. 

11. (Amended) Transgenic animals which harbour recombinant constructs or 
vectors according to Claim[s] 4 [to 6]. 

12. (Amended) Process for detecting telomerase-associated conditions in a 
patient, comprising the following steps: 

A. incubating a recombinant construct or vector according to Claim[s] 4 [to 
6], which additionally contains a reporter gene, with body fluids or cell 
samples, 

B. detecting the activity of the reporter gene in order to obtain a diagnostic 
value, and 

C. comparing the diagnostic value with standard values for the reporter 
gene construct in standardized normal cells or body fluids of the same 
type as the test sample. 



Nat'l Stage Filing of PCT/EP98/08216 
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Please add the following new claim 13. 



13. (New) A medicament comprising a recombinant construct or vector according 



By way of this Preliminary Amendment, claims 1-6 and 8-13 are pending in 
the application. Claims 4, 6 and 8-12 have been amended. Claim 13 has been added. 
These claim amendments, cancellations and additions are being made solely to 
remove multiple claim dependencies from the claims and to place the claims in a 
format appropriate for U.S. prosecution. 

Applicants believe that the subject matter of the pending claims is patentable 
and that the instant application should accordingly be allowed. If the Examiner 
believes that a conversation with Applicants' attorney would be helpfiil in expediting 
prosecution of this application, the Examiner is invited to call the undersigned 
attorney at (203) 812-3964. 



to claim 4. 



Remarks 



Respectfully submitted, 




400 Morgan Lane 



Reg. No. 41,670 



West Haven, CT 06516 
(Tel) (203) 812-3964 
(Fax) (203) 812-5492 



e-mail: jerrie.chiu.b@bayer.com 
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Regulatory DNA sequences of the gene for the human catalytic telomerase 
subunit and their diagnostic and therapeutic use 

Structure and function of the chromosome ends 

5 

The genetic material of eukaryotic cells is distributed on linear chromosomes. The 
ends of hereditary units are termed telomeres, derived from the Greek words telos 
(end) and meros (part, segment). Most telomeres consist of repeats of short sequences 
which are mainly composed of thymine and guanine (Zakian, 1995). In all the 
10 vertebrates which have so far been investigated, the telomeres consist of the sequence 
TTAGGG (Meyne et al, 1989). 

The telomeres have a variety of important functions. They prevent the fusion of 
chromosomes (McClintock, 1941) and thus the formation of dicentric hereditary 
15 units. Such chromosomes having two centromeres can lead to the development of 
cancer due to loss of heterozygosis or duplication, or loss of genes. 

In addition, telomeres serve the purpose of distinguishing intact hereditary units from 
damaged hereditary units. Thus, yeast cells ceased their cell division when they 
20 contained a chromosome without a telomere (Sandell and Zakian, 1993). 

Telomeres fulfil another important task in association with the replication of 
eukaryotic cell DNA. In contrast to the circular genomes of prokaryotes, the linear 
chromosomes of eukaryotes cannot be completely replicated by the DNA polymerase 

25 complex. RNA primers are required to initiate DNA replication. After elimination of 

the RNA primers, extension of the Okazaki fragments and subsequent ligation, the 
newly synthesized DNA strand lacks the 5' end since the RNA primer cannot be 
replaced by DNA at that point. Without special protective mechanisms, the 
chromosomes would therefore shrink with each cell division ("end-replication 

30 problem"; Harley et al, 1990). The non-coding telomere sequences presumably 

constitute a buffer zone for preventing the loss of genes (Sandell and Zakian, 1993). 
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In addition to this, telomeres also play an import role in regulating cell ageing 
(Olovnikov, 1973). Human somatic cells exhibit a limited capacity for replication in 
culture; after a certain period of time, they become senescent. In this state, the cells 
no longer divide even after having been stimulated with growth factors; however, 
5 they do not die and remain metabolically active (Goldstein, 1990). Various 
observations support the hypothesis that a cell determines how many more times it 
can divide on the basis of the length of its telomeres (AUsopp et al, 1992). 

In summary, the telomeres consequently possess key ftinctions in the ageing of cells, 
10 and in stabilizing the genetic material and preventing cancer. 

The enzyme telomerase synthesizes the telomeres 

As described above, organisms which possess linear chromosomes can only replicate 
1 5 their genome incompletely in the absence of a special protective mechanism. Most 

eukaryotes use a special enzyme, i.e. telomerase, for regenerating the telomere 
sequences. Telomerase is expressed constitutively in the single-cell organisms which 
have so far been investigated. On the other hand, telomerase activity has only been 
measured in humans in germ cells and tumour cells, whereas neighbouring somatic 
20 tissue did not contain any telomerase (Kim et al. , 1 994). 

Telomerase can also be designated functionally as terminal telomere transferase, 
which is located in the cell nucleus as a multiprotein complex. While the RNA 
moiety of human telomerase has been known for a relatively long period of time 
25 (Feng et a!., 1995), the catalytic subunit of this enzyme group was recently identified 

in a vanety of organisms (Lingner et al., 1997; cf. our application PCT EP/98/03468 
which IS likewise pending). These catalytic subunits of telomerase are strikmgly 
homologous both among themselves and in relation to all previously known reverse 
transcriptases. 

30 

WO 98/14592 also describes nucleic acid and amino acid sequences of the catalytic 
telomerase subunit. 
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Activation of telomerase in human tumours 

It was originally only possible to demonstrate telomerase activity in humans in germ 
5 line cells and not in normal somatic cells (Hastie et al, 1990; Kim et al, 1994). 
Following the development of a more sensitive detection method (Kim et al., 1994), 
a low telomerase activity was also detected in hematopoietic cells (Broccoli et al, 
1995; Counter et al, 1995; Hiyama et al, 1995). It is true, however, that these cells 
nevertheless exhibited a reduction in the telomeres (Vaziri et al, 1994; Counter et 
10 al, 1995). It has still not been resolved whether the quantity of enzyme in these cells 
is not sufficient for compensating the telomere loss or whether the telomerase activity 
which is measured stems from a subpopulation, e.g. incompletely differentiated 
CD34+38"^ precursor cells (Hiyama et al, 1995). In order to resolve this, it would be 
necessary to detect telomerase activity in a single cell. 

15 

Interestingly, however, significant telomerase activity was detected in a large number 
of the tumour tissues which had thus far been tested (1734/2031, 85%; Shay, 1997), 
whereas no activity was found in normal somatic tissue (1/196, <1%, Shay, 1997). In 
addition various investigations have shown that the telomeres still shrank in 

20 senescent cells which were transformed with viral oncoproteins and it was only 
possible to detect telomerase in the subpopulation which survived the growth crisis 
(Counter et al, 1992). The telomeres were also stable in these immortalized cells. 
(Counter et al, 1992), Similar findings from investigations in mice (Blasco et al, 
1996) support the assumption that reactivation of the telomerase is a late event in 

25 tumorigenesis. 

Based on these results, a "telomerase hypothesis" was developed which links the loss 
of telomere sequences and cell ageing with telomerase activity and the development 
of cancer. In long-lived species such as humans, the shrinking of the telomeres can be 
30 regarded as being a mechanism for suppressing tumours. Differentiated cells which 
do not contain any telomerase cease their cell division at a particular telomere length. 
If such a cell mutates, it can only form a tumour if the ceil can extend its telomeres. 
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Otherwise, the cell would continue to lose telomere sequences until its chromosomes 
became unstable and it was finally destroyed. Telomerase reactivation is presumably 
the main mechanism used by tumour cells to stabilize their telomeres. 

5 It follows from these observations and considerations that it should be possible to 
treat tumours by inhibiting the telomerase. Conventional cancer therapies using 
cytostatic agents or short-wave radiation damage all the dividing cells in the body in 
addition to the tumour cells. However, since only germ line cells, apart from tumour 
cells, contain significant telomerase activity, telomerase inhibitors would attack the 

10 tumour cells more specifically and consequently elicit fewer undesirable side effects. 

Telomerase activity has been detected in all the tumour tissues which have so far 
been tested, which means that these therapeutic agents could be employed against all 
types of cancer. The effect of telomerase inhibitors would then set in when the 
telomeres of the cells had shortened to such an extent that the genome became 

15 unstable. Since tumour cells usually possess telomeres which are shorter than those 
of normal somatic cells, cancer cells would be the first to be eliminated by the 
telomerase inhibitors. By contrast, cells possessing long telomeres, such as the germ 
cells, would only be damaged at a much later date. Telomerase inhibitors 
consequently represent a potential way forward in the treatment of cancer. 

20 

It becomes possible to obtain unambiguous answers to the question of the nature and 
pomts of attack of physiological telomerase inhibitors once the manner in which 
expression of the telomerase gene is regulated has also been identified. 

25 Regulation of gene expression in eukaryotes 

There are a large number of points in eukaryotic gene expression, i.e. the cellular 
flow of information from the DNA to the protein by way of the RNA, at which 
regulatory mechanisms can exert an effect. Examples of individual control steps are 
30 gene amplification, the recombination of gene loci, chromatin structure, DNA 
methylation, transcription, post-transcriptional modifications of mRNA, mRNA 
transport, translation and post-translational modifications of proteins. Studies which 
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have been carried out to date indicate that control at the level of transcription 
initiation is of the greatest importance (Latchman, 1991). 

A region which is responsible for regulating transcription, and which is designated 
5 the promoter region, is located directly upstream of the transcription start of a gene 
which is transcribed by RNA polymerase II. Comparison of the nucleotide sequences 
of promoter regions from a large number of known genes shows that particular 
sequence motifs occur regularly in this region. These elements include, inter alia, the 
TATA box, the CCAAT box and the GC box, which elements are recognized by 
10 specific proteins. The TATA box, which is located about 30 nucleotides upstream of 
the transcription start, is, for example, recognized by the TFIID subunit TBP ("TATA 
box-binding protein"), whereas particular GC-rich sequence segments are specifically 
bound by the transcription factor Spl ("specificity protein 1"). 

15 The promoter can be functionally subdivided into a regulatory segment and a 
constitutive segment (Latchman, 1991). The constitutive control region comprises the 
so-called core promoter which enables transcription to be initiated correctly. This 
promoter contains the sequence elements which are described as UPE's (upstream 
promoter elements) which are necessary for efficient transcription. The regulatory 

20 control segments, which can be interlaced with the UPE's, possess sequence elements 
which can be involved in the signal-dependent regulation of transcription by 
hormones, growth factors, etc. They impart tissue-specific or cell-specific promoter 
properties. 

25 DNA segments which are able to exert an mfluence on gene expression over 

relatively large distances are a characteristic feature of eukaryotic genes. These 
elements can be located upstream or downstream of a transcription unit, or within the 
unit, and can perform their function independently of their orientation. These 
sequence segments may reinforce (enhancers) or attenuate (silencers) promoter 

30 activity. In a similar way to the promoter regions, enhancers and silencers also 

accommodate several binding sites for transcription factors. 
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The invention relates to the DNA sequences from the 5 '-flanking region of the gene 
for the catalytically active human telomerase subunit and intron sequences for this 
gene. 

5 The invention particularly relates to the 5 '-flanking regulatory DNA sequence which 
contains the promoter DNA sequence for the gene for the human catalytic telomerase 
subunit, as depicted in Fig. 10 (SEQ ED NO 3). 

The invention furthermore relates to part regions of the 5 '-flanking regulatory DNA 
10 sequence, as depicted in Fig. 4 (SEQ ED NO 1), which has a regulatory effect. 

Intron sequences for the gene for the human catalytic telomerase subunit, in 
particular those sequences which have a regulatory effect, are also part of the subject- 
matter of the present invention. The intron sequences according to the invention are 
15 described in detail in the context of Example 5 (cf SEQ ID NO 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19 and 20). 

The invention furthermore relates to a recombinant construct which comprises the 
DNA sequences according to the invention, in particular the S'-flanking DNA 
20 sequence of the gene for the human catalytic telomerase subunit, or part regions 
thereof 

Preference is given to recombinant constructs which, in addition to the DNA 
sequences according to the invention, in particular the 5'-flanking DNA sequence of 
25 the gene for the human catalytic telomerase subunit, or part regions thereof, also 
contain one or more additional DNA sequences which encode polypeptides or 
proteins. 

Accordmg to a particularly preferred embodiment, these additional DNA sequences 
30 encode antineoplastic proteins. 
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Particular preference is given to those antineoplastic proteins which inhibit 
angiogenesis directly or indirectly. Examples of these proteins are: 

Plasminogen activator inhibitor (PAI-I), PAI-2, PAI-3, angiostatin, endostatin, 
5 platelet factor 4, TIMF- 1 , TIMP-2, TIMP-3 and leukaemia inhibitory factor (LIF). 

Antineoplastic proteins which have a direct or indirect cytostatic effect on mmours 
are likewise particularly preferred. These proteins include, in particular: 

10 perforin, granzyme, IL-2, TL-4; IL-12, interferons, such as IFN-a, IFN-fi and 
IFN-y, TNF, TNF-a, TNF-B, oncostatin M; tumour suppressor genes, such as p53, 
retinoblastoma. 

Particular preference is furthermore given to antineoplastic proteins which, where 
15 appropriate in addition to their antineoplastic effect, stimulate inflammations and 
thereby contribute to the elimination of tumour cells. Examples of these proteins are: 

RANTES, monocyte chemotactic and activating factor (MCAF), IL-8, macrophage 
inflammatory protein (MIP-la,-fi), neutrophil activating protein-2 (NAP-2), IL-3, IL- 
20 5, human leukaemia inhibitory factor (LIF), IL-7, IL-11, IL-13, GM-CSF, G-CSF and 
M-CSF. 

Particular preference is furthermore given to antineoplastic proteins which, due to 
their action as enzymes, are able to convert precursors of an antineoplastic active 
25 compound into an antineoplastic active compound. Examples of these enzymes are: 

herpes simplex virus thymidine kinase, varicella zoster virus thymidine kmase, 
bacterial nitroreductase, bacterial B-glucuronidase, plant 6-glucuronidase from Secale 
cereale, human glucuronidase, human carboxypeptidase, bacterial carboxypeptidase, 
30 bacterial B-iactamase, bacterial cytosine deaminidase, human catalase and/or 

phosphatase, human alkaline phosphatase, type 5 acid phosphatase, human 
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lysooxidase, human acid D-aminooxidase, human glutathione peroxidase, human 
eosinophil peroxidase and human thyroid peroxidase. 

The abovementioned recombinant constructs can also contain DNA sequences which 
5 encode factor VIII or factor IX, or part fragments thereof. These DNA sequences also 
include other blood clotting factors. 

The abovementioned recombinant constructs can also contain DNA sequences which 
encode a reporter protein. Examples of these reporter proteins are; 

10 

Chloramphenicol acetyl transferase (CAT), glow-worm luciferase (LUC), B-galac- 
tosidase (B-Gal), secreted alkaline phosphatase (SEAP), human growth hormone 
(hGH), B-glucuronidase (GUS), green-fluorescing protein (GFP), and all the variants 
derived therefrom, aquarin and obelin. 

15 

Recombinant constructs according to the invention can also contain DNA which 
encodes the human catalytic telomerase subunit and its variants and fragments in the 
antisense orientation. Where appropriate, these constructs can also contain other 
protein subunits of the human telomerase and the telomerase RNA component in the 
20 antisense orientation. 

The recombinant constructs can, in addition to the DNA which encodes the human 
catalytic telomerase subunit, and its variants and fragments, also contain other 
protein subunits of the human telomerase and the telomerase RNA component. 

25 

The invention furthermore relates to a vector which contains the abovementioned 
DNA sequences accordmg to the invention, in particular the 5'-flanking DNA 
sequences and also one or more of the other DNA sequences mentioned above. 

30 The preferred vector for these constructs is a virus, for example a retrovirus, an 

adenovirus, an adeno-associated virus, a herpes simplex virus, a vaccina virus, a 
lentiviral virus, a Sindbis virus and a Semhki forest virus. 
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Preference is also given to using plasmids as vectors. 

The invention furthermore relates to pharmaceutical preparations which comprise 
5 recombinant constructs or vectors according to the invention; for example a 
preparation in a colloidal dispersion system. 

Examples of suitable colloidal dispersion systems are liposomes or polylysine 
ligands. 

10 

The preparations of the constructs or vectors according to the invention in colloidal 
dispersion systems can be supplemented with a ligand which binds to the membrane 
strucmres of tumour cells. Such a ligand can, for example, be attached to the 
construct or the vector or else be a component of the liposome structure. 

15 

Suitable ligands are, in particular, polyclonal or monoclonal antibodies, or antibody 
fragments thereof, which bind, by their variable domains, to the membrane structures 
of tumour cells, or substances carrying mannose terminally, cytokines or growth 
factors, or fragments or part sequences thereof, which bind to receptors on mmour 
20 cells. 

Examples of corresponding membrane structures are receptors for a cytokine or a 
growth factor, such as IL-1, EGF, PDGF, VEGF, TGF 6, insulin or insulin-like 
growth factor (ILGF), or adhesion molecules, such as SLeX, LFA-1, MAC-1, 
25 LECAM-1 or VLA-4, or the mannose-6-phosphate receptor. 

The present invention includes pharmaceutical preparations which, in addition to the 
vector constructs accordmg to the invention, can also comprise non-toxic, inert, 
pharmaceutically suitable excipients. It is possible to conceive of administering (e.g. 
30 intravenously, intraarterially, intramuscularly, subcutaneously, intradermal ly, anally. 

vaginally, nasally, transdermal ly, intraperitoneally, as an aerosol or orally) these 
preparations at the site of a tumour or administering them systemical/y. 
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The vector constructs according to the invention can be employed in gene therapy. 

The invention furthermore relates to a recombinant host cell, in particular a 
5 recombinant eukaryotic host cell, which harbours the above-described constructs or 
vectors. 

The invention furthermore relates to a process for identifying substances which affect 
the promoter activity, silencer activity or enhancer activity of the catalytic telomerase 
10 subunit, with this process comprising the following steps: 

A. adding a candidate substance to a host cell which harbours the regulatory 
DNA sequence according to the invention, in particular the 5'-flanking 
regulatory DNA sequence for the gene for the human catalytic telomerase 

15 subunit, or a part region thereof which has a regulatory effect, which sequence 

or part region is functionally linked to a reporter gene, and 

B. measuring the effect of the substance on expression of the reporter gene. 

20 The process can be employed for identifying substances which increase the promoter 
activity, silencer activity or enhancer activity of the catalytic telomerase subunit. 

The process can furthermore be employed for identifying substances which inhibit 
the promoter activity, silencer activity or enhancer activator of the catalytic 
25 telomerase subunit. 

The invention furthermore relates to a process for identifying factors which bind 
specifically to fragments of the DNA fragments according to the invention, in 
particular the 5'-flanking regulatory DNA sequence of the catalytic telomerase 
30 subunit. This method comprises screening an expression cDNA library using the 

above-described DNA sequence, or subfragments of widely differing length, as the 
probe. 
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The above-described constructs or vectors can also be used for preparing transgenic 
animals. 

5 The invention furthermore relates to a process for detecting telomerase-associated 
conditions in a patient, which process comprises the following steps: 

A. incubating a construct or vector, which contains the DNA sequence according 
to the invention, in particular the 5 '-flanking regulatory DNA sequence for the 

10 gene for the human catalytic telomerase subunit, or a part region thereof 

having a regulatory effect, and a reporter gene, with body fluids or cell 
samples, 

B. detecting the activity of the reporter gene in order to obtain a diagnostic value; 
15 and 

C. comparing the diagnosic value with standard values for the reporter gene 
construct in standardized normal cells or body fluids of the same type as the 
test sample; 

20 

The detection of diagnostic values which are higher or lower than the standard 
comparative values indicates a telomerase-associated condition, which in turn 
indicates a pathogenic condition. 

25 Explanation of the figures: 

Fig. 1 : Southern blot analysis using genomic DNA from various species 

A: Photograph of an ethidium bromide-stained 0.7% agarose gel 
30 containing approximately 4 ^g of Eco Rl-cut genomic DNA. Track 1 

contains Hind Ill-cut X DNA as size markers (23.5. 9.4, 6.7. 4 4. 2.3, 2.0 
and 0.6 kb). Tracks 2 to 10 contain human, rhesus monkey, Sprague 
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Dawley rat, BALB/c mouse, dog, bovine, rabbit, chicken and yeast 
{Saccharomyces cerevisiae) genomic DNA. 



B: Autoradiogram, corresponding to Fig.l A, of a Southern blot analysis 
5 in which radioactively labelled hTC-cDNA probe of about 720 bp in 

length is used for the hybridization. 



Fig. 2: Restriction analysis of the recombinant X DNA of the phage clone P12, 
which hybridizes with a probe from the 5' region of the hTC cDNA. 

10 

The figure shows a photograph of an ethidium bromide-stained 0.4% 
agarose gel. Tracks 1 and 2 contain Eco RI/Hind Ill-cut X DNA and a 
1 kb ladder from Gibco as size markers. Tracks 3-7 each contain 250 ng 
of the DNA from the recombinant phage which has been cut with Bam 
15 HI (track 3), Eco RI (track 4), Sal I (track 5), Xho I (track 6) and Sac I 

(track 7). The arrows mark the two A. arms of the vector EMBL3 Sp6/T7. 



Fig. 3: Restriction analysis and Southern blot analysis of the recombinant 
X DNA of the phage clone which hybridizes with a probe from the 5' 
20 region of the hTC cDNA. 



A: The figure shows a photograph of an ethidium bromide-stained 0.8% 
agarose gel. Tracks 1 and 15 contain a 1 kb ladder from Gibco as size 
markers. Tracks 2 to 14 each contain 250 ng of cut X DNA from the 
25 recombinant phage clone. The following enzymes were employed: track 

2: Sac I, track 3: Xho I, track 4: Xho I, Xba 1, track 5: Sac I, Xho I, track 
6; Sal I, Xho I, Xba I, track 7: Sac I, Xho 1, Xba I, track 8: Sac I, Sal I, 
Xba I, track 9: Sac I, Sal I, BamH I, track 10: Sac I, Sal I, Xho I, track 11: 
Not I, track 12: Sma I, track 13: empty, track 14: not digested. 



Le A 32 805-Foreign Countries 



- 13- 

B: Autoradiogram, corresponding to Fig. 3 A, of a Southern blot analysis. 
A 5'-hTC cDNA fragment of about 420 bp in length was used as the 
probe for the hybridization. 

5 Fig. 4: Partial DNA sequence of the S'-flanking region and of the promoter of 
the gene for the human catalytic telomerase subunit. The ATG start 
codon in the sequence is printed in bold. The depicted sequence 
corresponds to SEQ ID NO 1. 

10 Fig. 5: Use of primer extension analysis to identify the transcription start. 

The figure shows an autoradiogram of a denaturing polyacrylamide gel 
which was selected for depicting a primer extension analysis. An 
oligonucleotide having the sequence 

1 5 5 'GTTAAGTTGTAGCTTACACTGGTTCTC 3 ' was used as the primer. 

The primer extension reaction was loaded in track 1. Tracks G, A, T and 
C constitute the sequence reactions using the same primer and the 
corresponding dideoxynucleotides. The thick arrow marks the main 
transcription start while the thin arrows point to three subsidiary 

20 transcription start points. 

Fig. 6: cDNA sequence of the human catalytic telomerase subunit (hTC; cf. our 
pending application PCT/EP/98/03468). The depicted sequence 
corresponds to SEQ ID NO 2. 

25 

Fig. 7: Structural organization and restriction map of the human hTC gene and 
its 5'-flankmg and 3 '-flanking regions. 

Exons are shown as consecutively numbered rectangles which are filled- 
30 in in black, and introns are shown as regions which are not filled in. 

Untranslated sequence segments in the exons are hatched. Translation 
starts in exon 1 and ends m exon 16. Restriction enzyme cleavage sites 
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are marked as follows: S, Sad; X, Xhol. The relative arrangement of the 
five phage clones (P2, P3, P5, P12, PI 7), and of the product from the 
genome walking, are shown by thin lines. As the dots indicate, the 
sequence of intron 16 has only been partly deciphered. 

HTL splice variants. 

A: Diagrammatic structure of the hTC mRNA splice variants. The 
complete hTC mRNA is depicted as a rectangle with a grey background 
in the upper region of the figure. The 16 exons are depicted in accordance 
with their size. The translation start (ATG) and the stop codon, and also 
the telomerase-specific T motif, and the seven RT motifs, are all shown. 
The hTC variants are subdivided into deletion and insertion variants. The 
missing exon sequences are marked in the deletions. The insertions are 
shown by additional white rectangles. The sizes and origins of the 
inserted sequences are given. Newly formed stop codons are marked. The 
size of the insertion in variant INS2 is unknown. 

B: Exon-intron transitions in the hTC splice variants. Unspliced 5'- 
20 flanking and 3 '-flanking sequences are shown as white rectangles. The 

origins of the exon and intron sequences are given. Intron and exon 
sequences are shown in small letters and large letters, respectively. The 
donor and acceptor sequences in the splice sites are underlaid as grey 
rectangles, and their exon and intron origins are also given. 

25 

Fig. 9: Identification of the transcription start by means of RT-PCR analysis. 

The RT-PCR was carried out usmg a cDNA library prepared from HL 60 
cells and genomic DNA as the positive control. A common 3' primer 
hybridizes to a region of the exon 1 sequence. The positions of the 
30 different 5' primers in the coding region or the 5'-flanking region are 

given. In the negative control, no template DNA was added to the PCR 
reaction. M: DNA size marker. 



Fig. 



10 



15 
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Fig. 10: Nucleotide sequence and structural features of the hTC promoter. 

The figure depicts 11273 bp of the 5 '-flanking hTC gene sequence, 
beginning with the translation start codon ATG (+1). The putative region 
5 of the translation start is underlined. Possible regulatory sequence 

segments within the 4000 bp upstream of the translation start are ringed. 
The depicted sequence corresponds to SEQ ID NO 3. 



Fig. 1 1 : Activity of the hTC promoter in HEK-293 cells. 

The first 5000 bp of the 5 '-flanking hTC gene region are shown 
diagrammatically in the upper part of the figure. The ATG start codon is 
picked out. CpG-rich islands are marked by grey rectangles. The sizes of 
the hTC promoter-luciferase construct are shown on the left-hand side of 
the figure. The promoterless pGL2 basic construct and the SV40 
promoter construct pGL2-Pro were used as controls in each transfection. 
The relative luciferase activities of the different promoter constructs in 
HEK cells are shown as continuous bars on the right-hand side of the 
figure. The standard deviation is indicated. The numerical values 
represent the average of two independent experiments which were carried 
out in duplicate. 



Tab. 1 : Exon-intron transitions in the hTC gene 

The table lists the nucleotide sequences at the 3' and 5' splice transitions 
of the hTC gene. The consensus sequences for donor and acceptor 
25 sequences (AG and GT) are underlaid with grey rectangles. The table 

shows the intron sequences (small letters) and exon sequences (large 
letters) which flank the splice acceptor and donor sites. The sizes of the 
exons and mtrons are given in bp. 



30 Tab. 2: Potential bmding sites for DNA-bindmg factors in the nucleotide 

sequence of intron 2 
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The search for possible DNA-binding factors (e.g. transcription factors) 
was carried out using the "find pattern" algorithm from the Genetics 
Computer Group (Madison, USA) GCG sequence analysis program 
package. The table lists the abbreviations of the DNA-binding factors 
which were identified and their location in intron 2. 



Cj> tji tJ^ 



tJi tJi o> o> 




u -P 0 
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Tab. 2 



Factors 


Location in intron 2 


C/EBP 


2925 


CRE.2 


2749 


Spl 


2378, 4094, 4526, 4787, 4835, 4995 


AP-2 CS3 


5099 


AP-2 CS4 


2213, 3699, 4667, 5878, 5938, 6059, 6180, 6496 


AP-2 CSS 


5350, 5798, 5880, 5940, 6061, 6182, 6375, 6498 


PEAS 


934, 2505 


P53 


2125 


GR uteroglobin 


848, 1487, 2956 


PR uteroglobin 


3331 - 


Zeste-white 


1577, 1619, 1703, 1745, 1787, 1829, 1871, 1913, 1955, 
1997, 2039, 2081, 3518, 3709, 4765, 5014, 5055 


ORE 


846 


MyoD-MCK right 


447, 509, 558, 1370, 1595, 1900, 2028, 2099, 4557 


site/rev 




MyoD-MCK left site 


108, 118, 453, 1566, 1608, 1692, 1734, 1818, 1902, 
1986, 2372, 2460, 2720, 3491, 5030 


Ets-1 CS 


6408 


API 


3784, 4406 


CREB 


2801 


GATA-1 


839, 1390, 3154 


c-Myc 


108, 118, 453, 1566, 1608, 1692, 1734, 1818, 1902, 
1986, 2372, 2460, 2720, 3491, 5030 


CACCC site 


991 


CCAAT site 


1224 


CCAC box 


992 


CAAT site 


463, 2395 


Rb site 


992, 4663 


TATA 


3650 


CDEI 


106, 1564, 1606, 1690, 1732, 1816, 1900, 1984 
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Examples 

The human gene for the catalytic telomerase subunit (ghTC), and the regions of this 
gene located 5' and 3', were cloned, while the start point for transcription was 
5 determined, potential binding sites for DNA-binding proteins were identified and 
active promoter fragments were highlighted. The sequence of the hTC cDNA (Fig. 6) 
has already been reported in our application PCT/EP/98/03468, which is also 
pending. Unless otherwise mentioned, all the data refer to the position of the cDNA 
in this sequence. 

10 

Example 1 

A genomic Southern blot analysis was used to determine whether ghTC constitutes a 
single gene in the human genome or whether there exist several loci for the hTC gene 
1 5 and possibly also ghTC pseudogenes. 

In order to do this, a commercially available zoo blot from Clontech was subjected to 
Southern blot analysis. This blot contains 4 ^g of Eco Rl-cut genomic DNA from 
nine different species (human, monkey, rat, mouse, dog, bovine, rabbit, chicken and 

20 yeast). With the exception of yeast, chicken and human, the DNA was isolated from 
kidney tissue. The human genomic DNA was isolated from placenta and the chicken 
genomic DNA was purified from liver tissue. An hTC cDNA fragment of about 720 
bp in length, which was isolated from hTC cDNA, variant Del2 (position 1685 to 
2349 plus 2531 to 2590 in Fig. 6 [deletion 2; cf Example 5 in Fig. 8]), was used as 

25 the radioactively labelled probe in the autoradiogram m Fig. 1. The experimental 
conditions for the blot hybridization and washing steps were taken from Ausubel et 
al. (1987). 

In the case of the human DNA, the probe recognizes two specific DNA fragments. 
30 The smaller Eco RI fragment, of from about 1.5 to 1.8 kb in length, probably 
originates from two Eco RI cleavage sites m an intron m the ghTC DNA. On the 
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basis of this result, it is to be assumed that only one single ghTC gene is present in 
the human genome. 

Example 2 

5 

hi order to isolate the 5' flanking hTC gene sequence, approx. 1.5 x 10^ phages from 
a human genomic placenta gene library (EMBL 3 SP6/T7 from Clontech, order 
number HL1067j) were hybridized on nitrocellulose filters (0.45 |am; from 
Schleicher and Schuell), in accordance with the manufacturer's instructions, with a 

10 radioactively labelled 5 '-hTC cDNA fragment of about 500 bp in length (position 
839 to 1345 in Fig. 6). The nitrocellulose filters were firstly incubated, at 42°C for 
two hours, in 2 X SSC (0.3 M NaCl; 0.5 M Tris-HCl, pH 8.0) and then in a 
prehybridization solution (50% formamide; 5 x SSPE, pH 7.4; 5 x Denhard's 
solution; 0.25% SDS; 100 |ag of herring sperm DNA/ml). For the overnight 

15 hybridization, the prehybridization solution was supplemented with 1.5 x 10^ cpm of 
denatured, radioactively labelled probe/ml of solution. Nonspecifically bound 
radioactive DNA was removed under stringent conditions, i.e. by means of three five- 
minute steps of washing with 2 x SSC; 0.1% SDS at from 55 to 65°C. The filters 
were evaluated by autoradiography. 

20 

The phage clones which were identified in this primary investigation were purified 
(Ausubel et al. (1987)). hi subsequent analyses, one phage clone, i.e. P12 turned out 
to be potentially positive. A A. DNA preparation carried out on this phage (Ausubel et 
al. (1987)), and the subsequent restriction digestion with enzymes which release the 
25 genomic insert in fragments, showed that this phage clone contains an insert of 

approx. 15 kb in the vector (Fig. 2). 

In order to isolate the complete hTC gene sequence, in each case from 1 to 1.5 x 10^ 
phages were screened, in independent experiments, with in each case different 
30 radioactively labelled probes, as described above. 
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The phage clones which were identified in these primary investigations, and which 
were positive for the corresponding probes, were purified. The phage clone PI 7 was 
found to contain an hTC cDNA fragment of about 250 bp in length (position 1787 to 
2040 in Fig. 6). The phage clone P2 was identified as containing an hTC cDNA 
5 fragment of about 740 bp in length (position 1685 to 2349 plus 253 1 to 2607 in Fig. 
6 [deletion 2; cf. Example 5]). The phage clones P3 and P5 were found to contain a 
3' hTC cDNA fragment of 420 bp in length (posifion 3047 to 3470 in Fig. 6). After 
the X DNA had been prepared from these phages, and subsequently subjected to 
restriction digestion with enzymes which release the genomic insert in fragments, the 
10 inserts were subcloned into plasmids (Example 4). 

Example 3 

In order to investigate whether the 5 ' end of the hTC cDNA was also present in the 
15 insert in the recombinant phage clone PI 2, the X DNA from this clone was 

hybridized, in a Southern blot analysis, with a radiactively labelled hTC cDNA 
fragment of about 440 bp in length (position 1 to 440 in Fig. 6) from the extreme 5 ' 
region (Fig. 3). 

20 Since the isolated X DNA from the positive clone also hybridizes with the extreme 5' 
end of the hTC cDNA, this phage probably also contains the 5' sequence region 
flanking the ATG start codon. 

Example 4 

25 

In order to subclone the entire 15 kb insert in the positive phage clone P12 in the 
form of subfragments, and subsequently to sequence these fragments, restriction 
endonucleases which, on the one hand, release the entire insert from EMBL3 Sp6/T7 
(cf Example 2) and, in addition, cut within the msert, were selected for digesting the 
30 DNA. 
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In all, two Xho I subfragments, of about 8.3 and about 6.5 kb in length, respectively, 
and three Sac I subfragments, of about 8.5, about 3.5 and about 3 kb in length, 
respectively, were subcloned into the pBluescript KS(+) vector (from Stratagene). 
The 5123 bp 5 '-flanking nucleotide sequence of the ghTC gene region, starting from 
5 the ATG start codon, was determined by analysing the sequences of these fragments 
(Fig. 4; corresponding to SEQ ID NO 1). Fig. 4 depicts the first 5123 bp (starting 
from the ATG start codon). Fig. 10 depicts the entire cloned 5' sequence 
(corresponding to SEQ ID NO 3). 

10 In order to subclone the entire insert, of approx. 14.6 kb in size, in phage clone PI 7 
in the form of subfragments, restriction endonucleases which, on the one hand, 
release the entire insert from EMLB3 Sp6/T7 and, in addition, cut a few times within 
the insert, were selected for digesting the DNA. Three XhoI/BamHI fragments, of 7.1 
kb, 4.2 kb and 1.5 kb in size, respectively, and one BamHI fragment, of 1.8 kb in 

15 size, were subcloned by means of using a combination digestion with the enzymes 

Xhol and BamHI. Combination restriction digestion with the enzymes Xhol and Xbal 
resulted in a Xhol/Xbal fragment of 6.5 kb in size, and two Xhol fragments, of 6.5 kb 
and 1.5 kb in size, respectively, being cloned. 

20 Digestion with the restriction enzyme Xhol was used to subclone the insert, of 
approx. 17.9 kb in size, in phage clone P2 in the form of subfragments. In all, three 
Xhol subfragments, of 7.5 kb, 6.4 kb and 1.6 kb in length, respectively, were cloned. 
Four Sad fi-agments, of 4.8 kb, 3 kb, 2 kb and 1.8 kb in size, respectively, were 
additionally subcloned by digesting with the restriction enzyme Sad. 

25 

The insert, of approx. 13.5 kb in size, in phage clone P3 was subcloned by digesting 
with the restriction enzymes Sad and/or Xhol. Six Sad subfragments, of 3.2 kb, 
2 kb, 0.9 kb, 0.8 kb, 0.65 kb and 0.5 kb in length, respectively, and two Xhol 
subfragments, of 6.5 kb and 4.3 kb in length, respectively, were obtained m this 
30 connection. 
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The insert, of approx. 13.2 kb in size, in phage clone P5 was subcloned by digesting 
with the restriction enzymes Sad and/or Xhol. In ail, SacI fragments of 6.5 kb, 3.3 
kb, 3.2 kb, 0.8 kb and 0.3 kb in size, and Xhol fragmente of 7 kb and 3.2 kb in size, 
were subcloned. 

5 

In order to clone the hTC genomic sequence region located 3' of phage clone P17 
and 5' of phage clone P2, 3 genomic walkings were carried out using the Clontech 
Genome Walker™ kits (catalogue number K 1803-1) and various combinations of 
primers. In a final volume of 50 jjl, 10 pmol of dNTP mix were added to 1 |il of 

10 human Genome Walker Library HDL (from Clontech), and a PCR reaction was 
carried out in IxKlen Taq PCR reaction buffer and Ix Advantage Klen Taq 
polymerase mix (from Clontech). 10 pmol of an internal gene-specific primer, and 10 
pmol of the adaptor primer API (5'-GTAATACGACTCACTATAGGGC-3'; from 
Clontech) were added as primers. The PCR was carried out in 3 steps as a touchdown 

15 PCR. First of all, denaturation was carried out at 94°C for 20 sec, and the primers 

were then annealed, and the DNA chain extended, at 72°C for 4 min, over 7 cycles. 
There then followed 37 cycles in which the DNA was denaturated at 94°C for 20 sec 
but the subsequent primer extension took place at 67°C for 4 min. In conclusion, 
there followed a chain extension at 67°C for 4 min. After this first PCR, the PCR 

20 product was diluted 1:50. One |il of this dilution was used in a second nested PCR 
together with 10 pmol of dNTP mix in IxKlen Taq PCR reaction buffer and 
IxAdvantage Klen Taq polymerase mix and also 10 pmol of a nested gene-specific 
primer and 10 pmol of the nested Marathon Adaptor primers AP2 (5'- 
ACTATAGGGCACGCGTGGT-3'; from Clontech). The PCR conditions 

25 corresponded to the parameters which were selected in the first PCR. As the sole 

exception, only 5 cycles rather than 7 cycles were selected in the first PCR step and 
only 24 cycles, instead of 37 cycles, were run in the second PCR step. The products 
of this nested genomic walking PCR were cloned into the TA Cloning Vector pCRII 
from InVitrogen. 

30 
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In the first genomic walking, the gene-specific primer C3K2-GSP1 (5'- 
GACGTGGCTCTTGAAGGCCTTG-3') and the nested gene-specific primer C3K2- 
GSP2 (5'-GCCTTCTGGACCACGGCATACC-3') were used, together with the 
HDL library 4, and a PGR fi-agment of 1639 bp in length was obtained. In the second 
5 genomic walking, a PGR fi-agment of 685 bp in length was amplified from the HDL 
library 4 using the gene-specific primer C3F2 (5'- 
CGTAGTTGAGCACGCTGAACAGTG-3') and the nested gene-specific primer 
C3F (5'-CCTTCACCCTCGAGGTGAGACGCT-3. The third genomic walking 
mixture, using the gene-specific primer DEL5-GSP1 (5'- 
10 GGTGGATGTGACGGGCGCGTACG-3') and the nested gene-specific primer 
C5K-GSP1 (5'-GGTATGCCGTGGTCCAGAAGGC-3'), led to a 924 bp PGR 
fragments being cloned from the HDL library 1. In all, 2100 bp of the genomic hTC 
region located 3' of phage clone P17 were identified using this genomic walking 
method (see Fig. 7). 

15 

The subcloned fragments, and the genomic walking products, were sequenced in 
single-stranded form. The Lasergene Biocomputing Software (DNASTAR Inc. 
Madison, Wisconsin, USA) was used to identify overlapping regions and form 
contigs. In all, 2 large contigs were assembled from the sequences collected from 

20 phage clones PI 2, PI 7, P2, P3 and P5, and also the sequence data from the genomic 
walking. Contig 1 consists of sequence data from phage clones P12 and P17 and the 
sequence data from the genomic walking. Contig 2 was put together from the 
sequences from phage clones P2, P3 and P5. Overlapping phage clone regions are 
shown diagrammaticaly in Fig. 7. The sequence data from the 2 contigs are shown 

25 below. The ATG start codon in contig 1 is underlined. The TGA stop codon is 
underlined in contig 2. 
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Contigl: 



ACTTGAGCCC AAGAGTTCAA GGCTACGGTG AGCCRTGATT GCAACACCAC ACGCCAGCCT TGGTGACAGA 70 

ATGAGACCCT GTCTCAAAAA AAAAAAAAAA AATTGAAATA ATATAAAGCA TCTTCTCTGG CCACAGTGGA 14 0 

ACAAAACCAG AAATCAACAA CAAGAGGAAT TTTGAAAACT ATACAAACAC ATGAAAATTA AACAATATAC 210 

TTCTGAATGA CCAGTGAGTC AATGAAGAAA TTAAAAAGGA AATTGAAAAA TTTATTTAAG CAAATGATAA 28 0 

CGGAAACATA ACCTCTCAAA ACCCACGGTA TACAGCAAAA GCAGTGCTAA GAAGGAAGTT TATAGCTATA 350 

AGCAGCTACA TCAAAAAAGT AGAAAAGCCA GGCGCAGTGG CTCATGCCTG TAATCCCAGC ACTTTGGGAG 42 0 

GCCAAGGCGG GCAGATCGCC TGAGGTCAGG AGTTCGAGAC CAGCCTGACC AACACAGAGA AACCTTGTCG 4 90 

CTACTAAAAA TACAAAATTA GCTGGGCATG GTGGCACATG CCTGTAATCC CAGCTACTCG GGAGGCTGAG 560 

GCAGGATAAC CGCTTGAACC CAGGAGGTGG AGGTTGCGGT GAGCCGGGAT TGCGCCATTG GACTCCAGCC 630 

TGGGTAACAA GAGTGAAACC CTGTCTCAAG AAAAAAAAAA AAGTAGAAAA ACTTAAAAAT ACAACCTAAT 700 

GATGCACCTT AAAGAACTAG AAAAGCAAGA GCAAACTAAA CCTAAAATTG GTAAAAGAAA AGAAATAATA 770 

AAGATCAGAG CAGAAATAAA TGAAACTGAA AGATAACAAT ACAAAAGATC AACAAAATTA AAAGTTGGTT 84 0 

TTTTGAAAAG ATAAACAAAA TTGACAAACC TTTGCCCAGA CTAAGAAAAA AGGAAAGAAG ACCTAAATAA 910 

ATAMGTCAG AGATGAAAAA AGAGACATTA CAACTGATAC CACAGAAATT CAAAGGATCA CTAGAGGCTA 98 0 

CTATGAGCAA CTGTACACTA ATAAATTGAA AAACCTAGAA AAAATAGATA AATTCCTAGA TGCATACAAC 1050 

CTACCAAGAT TGAACCATGA AGAAATCCAA AGCCCAAACA GACCAATAAC AATAATGGGA TTAAAGCCAT 1120 

AATAAAAAGT CTCCTAGCAA AGAGAAGCCC AGGACCCAAT GGCTTCCCTG CTGGATTTTA CCAATCATTT 1190 

AAAGAAGAAT GAATTCCAAT CCTACTCAAA CTATTCTGAA AAATAGAGGA AAGAATACTT CCAAACTCAT 12 60 

TCTACATGGC CAGTATTACC CTGATTCCAA AACCAGACAA AAACACATCA AAAACAAACA AACAAAAAAA 13 30 

CAGAAAGAAA GAAAACTACA GGCCAATATC' CCTGATGAAT ACTGATACAA AAATCCTCAA CAAAACACTA 1400 

GCAAACCAAA TTAAACAACA CCTTCGAAAG ATCATTCATT GTGATCAAGT GGGATTTATT CCAGGGATGG 1470 

AAGGATGGTT CAACATATGC AAATCAATCA ATGTGATACA TCATCCCAAC AAAATGAAGT ACAAAAACTA 154 0 

TATGATTATT TCACTTTATG CAGAAAAAGC ATTTGATAAA ATTCTGCACC CTTCATGATA AAAACCCTCA 1610 

AAAAACCAGG TATACAAGAA ACATACAGGC CAGGCACAGT GGCTCACACC TGCGATCCCA GCACTCTGGG 1680 

AGGCCAAGGT GGGATGATTG CTTGGGCCCA GGAGTTTGAG ACTAGCCTGG GCAACAAAAT GAGACCTGGT 17 50 

CTACAAAAAA CTTTTTTAAA AAATTAGCCA GGCATGATGG CATATGCCTG TAGTCCCAGC TAGTCTGGAG 182 0 

GCTGAGGTGG GAGAATCACT TAAGCCTAGG AGGTCGAGGC TGCAGTGAGC CATGAACATG TCACTGTACT 18 90 

CCAGCCTAGA CAACAGAACA AGACCCCACT GAATAAGAAG AAGGAGAAGG AGAAGGGAGA AGGGAGGGAG 1960 

AAGGGAGGAG GAGGAGAAGG AGGAGGTGGA GGAGAAGTGG AAGGGGAAGG GGAAGGGAAA GAGGAAGAAG 2030 

AAGAAACATA TTTCAACATA ATAAAAGCCC TATATGACAG ACCGAGGTAG TATTATGAGG AAAAACTGAA 2100 

AGCCTTTCCT CTAAGATCTG GAAAATGACA AGGGCCCACT TTCACCACTG TGATICAACA TAGTACTAGA 2170 

AGTCCTAGCT AGAGCAATCA GATAAGAGA-A AGAAATAAAA GGCATCCAAA CTGGAAAGGA AGAAGTCAAA 224 0 

TTATCCTGTT TGCAGATGAT ATGATCTTAT ATCTGGAAAA GACTTAAGAC ACCACTAAAA AACTATTAGA 2310 

GCTGAAATTT GGTACAGCAG GATACAAAAT CAATGTACAA AAATCAGTAG TATTTCTATA TTCCAACAGC 238 0 

AAACAATCTG AAAAAGAAAC CAAAAAAGCA GCTACAAATA AAATTAAACA GCTAGGAATT AACCAAAGAA 24 50 

GTGAAAGATC TCTACAATGA AAACTATAAA ATGTTGATAA AAGAAATTGA AGAGGGCACA AAAAAAGAAA 2520 

AGATATTCCA TGTTCATAGA TTGGAAGAAT AAATACTGTT AAAATGTCCA TACTACCCAA AGCAATTTAC 2590 

AAATTCAATG CAATCCCTAT TAAAATACTA ATGACGTTCT TCACAGAAAT AGAAGAAACA ATTCTAAGAT 2660 

TTGTACAGAA CCACAAAAGA CCCAGAATAG CCAAAGCTAT CCTGACCAAA AAGAACAAAA CTGGAAGCAT 2730 

CACATTACCT GACTTCAAAT TATACTACAA AGCTATAGTA ACCCAAACTA CATGGTACTG GCATAAAAAC 2800 

AGATGAGACA TGGACCAGAG GAACAGAATA GAGAATCCAG AAACAAATCC ATGCATCTAC AGTGAACTCA 2870 

TTTTTGACAA AGGTGCCAAG AACATACTTT GGGGAAAAGA TAATCTCTTC AATAAATGGT GCTGGAGGAA 2940 

CTGGATATCC ATATGCAAAA TAACAATACT AGAACTCTGT CTCTCACCAT ATACAAAAGC AAATCAAAAT 3010 

GGATGAAAGG CTTAAATCTA AAACCTCAAA CTTTGCAACT ACTAAAAGAA AACACCGGAG AAACTCTCCA 308 0 

GGACATTGGA GTGGGCAAAG ACTTCTTGAG TAATTCCCTG CAGGCACAGG CAACCAAAGC AAAAACAGAC 3150 

AAATGGGATC ATATCAAGTT AAAAAGCTTC TGCCCAGCAA AGGAAACAAT CAACAAAGAG AAGAGACAAC 3220 

CCACAGAATG GGAGAATATA TTTGCAAACT ATTCATCTAA CAAGGAATTA ATAACCAGTA TATATAAGGA 3290 

GCTCAAACTA CTCTATAAGA AAAACACCTA ATAAGCTGAT TTTCAAAAAT AAGCAAAAGA TCTGGGTAGA 3360 

CATTTCTCAA AATAAGTCAT ACAAATGGCA AACAGGCATC TGAAAATGTG CTCAACACCA CTGATCATCA 3430 

GAGAAATGCA AATCAAAACT ACTATGAGAG ATCATCTCAT. CCCAGTTAAA ATGGCTTTTA TTCAAAAGAC 3500 

AGGCAATAAC AAATGCCAGT GAGGATGTGG ATAAAAGGAA ACCCTTGGAC ACTGTTGGTG GGAATGGAAA 3570 

TTGCTACCAC TATGGAGAAC AGTTTGAAAG TTCCTCAAAA AACTAAAAAT AAAGCTACCA TACAGCAATC 364 0 

CCATTGCTAG GTATATACTC CAAAAAAGGG AATCAGTGTA TCAACAAGCT ATCTCCACTC CCACATTTAC 3710 

TGCAGCACTG TTCATAGCAG CCAAGGTTTG GAAGCAACCT CAGTGTCCAT CAACAGACGA ATGGAAAAAG 37 8 0 

AAAATGTGGT GCACATACAC AATGGAGTAC TACGCAGCCA TAAAAAAGAA TGAGATCCTG TCAGTTGCAA 38 50 

CAGCATGGGG GGCACTGGTC AGTATGTTAA GTGAAATAAG CCAGGCACAG AAAGACAAAC TTTTCATGTT 3920 

CTCCCTTACT TGTGGGAGCA AAAATTAAA_-. CAATTGACAT AG.AAATAGAG GAGAATGGTG GTTCTAGAGG 3990 

GGTGGGGGAC AGGGTGACTA GAGTCAAC.---. TAATTTATTG TATGTTTTAA AATAACTAAA AGAGT^T.ajiT 40 60 

TGGGTTGTTT GTAACACAAA GAAAGGATA-A ATGCTTGAAG GTGACAGATA CCCCATTTAC CCTGATGTGA 4130 

TTATTACACA TTGTATGCCT GTATCAAAAT ATCTCATGTA TGCTATAGAT ATAAACCCTA CTATATTA.^A 4200 

AATTA.aAATT TTAATGGCCA GGCACGGTGG CTCATGTCCG TAATCCCAGC ACTTTGGGAG GCCGAGGCGG 4270 

GTGGATCACC TGAGGTCAGG AGTTTGAA_AC CAGTCTGGCC ACCATGATGA AACCCTGTCT CTACT^AAGA 4340 

TAC^AAAATT AGCCAGGCGT GGTGGCACAT ACCTGTAGTC CCAACTACTC AGGAGGCTGA GACAGGAGAA 44 10 

TTGCTTGAAC CTGGGAGGCG GAGGTTGCAG TGAGCCGAGA TCATGCCACT GCACTGCAGC CTGGGTGACA 4 4 80 

GAGCAAGACT CCATCTCAAA ACAAAAAC.AA AAAAAAGAAG ATTAAJVATTG TAATTTTTAT GTACCGTATA 4 5 50 

AATATATACT CTACTATATT AGAAGTTAA.-. AATTAAAACA ATTATAAAAG GTAATTAACC ACTTAA.TCTA 4 620 

AA.iTAAGAAC AATGTATGTG GGGTTTCTAG CTTCTGAAGA AGTA_AAAGTT ATGGCCACGA TGGCACAAAT 4 690 

GTG^GGAGGG AACAGTGGAA GTTACTGTTG TTAGACGCTC ATACTCTCTG TAAGTGACTT AATTTTAACC 4 7 60 

A.AAGACAGGC TGGGAGAAGT TAAAGAGGCA TTCTATAAGC CCTAAJVACAA CTGCTAAT.3J, TGGTGA.AAGG 4 8 30 

TAATCTCTAT TAATTACCAA TAATTACAGA TATCTCTAAA ATCGAGCTGC AGAATTGGCA CGTCTC^TCn 19 30 

CACCGTCCTC TCATTCACGG TGCTTTTTTT CTTGTGTGCT TGGAGATTTT CGATTGTGTG TTCGTGTTTG 4 970 

GTTViACTTA ATCTGTATGA ATCCTGAAA.C GAAAAATGGT GGTGATTTCC TCCAGAAGAA TTAGAGTACC 50 4 0 

TGGCAGGAAG CAGGTGGCTC TGTGGACCTG AGCCACTTCA ATCTTCAA.GG GTCTCTGGCC AAGACCCAGG 51:0 
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TGCAAGGCAG AGGCCTGATG ACCCGAGGAC AGGAAAGCTC GGATGGGAAG GGGCGATGAG AAGCCTGCCT 5180 

CGTTGGTGAG CAGCGCATGA AGTGCCCTTA TTTACGCTTT GCAAAGATTG CTCTGGATAC CATCTGGAAA 5250 

AGGCGGCCAG CGGGAATGCA AGGAGTCAGA AGCCTCCTGC TCAAACCCAG GCCAGCAGCT ATGGCGCCCA 5320 

CCCGGGCGTG TGCCAGAGGG AGAGGAGTCA AGGCACCTCG AAGTATGGCT TAAATCTTTT TTTCACCTGA 5390 

AGCAGTGACC AAGGTGTATT CTGAGGGAAG CTTGAGTTAG GTGCCTTCTT TAAAACAGAA AGTCATGGAA 54 60 

GCACCCTTCT CAAGGGAAAA CCAGACGCCC GCTCTGCGGT CATTTACCTC TTTCCTCTCT CCCTCTCTTG 5530 

CCCTCGCGGT TTCTGATCGG GACAGAGTGA CCCCCGTGGA GCTTCTCCGA GCCCGTGCTG AGGACCCTCT 5 600 

TGCAAAGGGC TCCACAGACC CCCGCCCTGG AGAGAGGAGT CTGAGCCTGG CTTAATAACA AACTGGGATG 5 670 

TGGCTGGGGG CGGACAGCGA CGGCGGGATT CAAAGACTTA ATTCCATGAG TAAATTCAAC CTTTCCACAT 57 4 0 

CCGAATGGAT TTGGATTTTA TCTTAATATT TTCTTAAATT TCATCAAATA ACATTCAGGA CTGCAGAAAT 5810 

CCAAAGGCGT AAAACAGGAA CTGAGCTATG TTTGCCAAGG TCCAAGGACT TAATAACCAT GTTCAGAGGG 5880 

ATTTTTCGCC CTAAGTACTT TTTATTGGTT TTCATAAGGT GGCTTAGGGT GCAAGGGAAA GTACACGAGG 5950 

AGAGGCCTGG GCGGCAGGGC TATGAGCACG GCAGGGCCAC CGGGGAGAGA GTCCCCGGCC TGGGAGGCTG 6020 

ACAGCAGGAC CACTGACCGT CCTCCCTGGG AGCTGCCACA TTGGGCAACG CGAAGGCGGC CACGCTGCGT 6090 

GTGACTCAGG ACCCCATACC GGCTTCCTGG GCCCACCCAC ACTAACCCAG GAAGTCACGG AGCTCTGAAC 6160 

CCGTGGAAAC GAACATGACC CTTGCCTGCC TGCTTCCCTG GGTGGGTCAA GGGTAATGAA GTGGTGTGCA 62 30 

GGAAATGGCC ATGTAAATTA CACGACTCTG CTGATGGGGA CCGTTCCTTC CATCATTATT CATCTTCACC 6300 

CCCAAGGACT GAATGATTCC AGCAACTTCT TCGGGTGTGA CAAGCCATGA CAAAACTCAG TACAAACACC 6370 

ACTCTTTTAC TAGGCCCACA GAGCACGGSC CACACCCCTG ATATATTAAG AGTCCAGGAG AGATGAGGCT 64 40 

GCTTTCAGCC ACCAGGCTGG GGTGACAACA GCGGCTGAAC AGTCTGTTCC TCTAGACTAG TAGACCCTGG 6510 

CAGGCACTCC CCCAGATTCT AGGGCCTGGT TGCTGCTTCC CGAGGGCGCC ATCTGCCCTG GAGACTCAGC 6580 

CTGGGGTGCC ACACTGAGGC CAGCCCTGTC TCCACACCCT CCGCCTCCAG GCCTCAGCTT CTCCAGCAGC 6650 

TTCCTAAACC CTGGGTGGGC CGTGTTCCAG CGCTACTGTC TCACCTGTCC CACTGTGTCT TGTCTCAGCG 6720 

ACGTAGCTCG CACGGTTCCT CCTCACATGG' GGTGTCTGTC TCCTTCCCCA ACACTCACAT GCGTTGAAGG 67 90 

GAGGAGATTC TGCGCCTCCC AGACTGGCTC CTCTGAGCCT GAACCTGGCT CGTGGCCCCC GATGCAGGTT 68 60 

CCTGGCGTCC GGCTGCACGC TGACCTCCAT TTCCAGGCGC TCCCCGTCTC CTGTCATCTG CCGGGGCCTG 6930 

CCGGTGTGTT CTTCTGTTTC TGTGCTCCTT TCCACGTCCA GCTGCGTGTG TCTCTGCCCG CTAGGGTCTC 7000 

GGGGTTTTTA TAGGCATAGG ACGGGGGCGT GGTGGGCCAG GGCGCTCTTG GGAAATGCAA CATTTGGGTG 7070 

TGAAAGTAGG AGTGCCTGTC CTCACCTAGG TCCACGGGCA CAGGCCTGGG GATGGAGCCC CCGCCAGGGA 7140 

CCCGCCCTTC TCTGCCCAGC ACTTTCCTGC CCCCCTCCCT CTGGAACACA GAGTGGCAGT TTCCACAAGC 7210 

ACTAAGCATC CTCTTCCCAA AAGACCCAGC ATTGGCACCC CTGGACATTT GCCCCACAGC CCTGGGAATT 72 80 

CACGTGACTA CGCACATCAT GTACACACTC CCGTCCACGA CCGACCCCCG CTGTTTTATT TTAATAGCTA 7350 

CAAAGCAGGG AAATCCCTGC TAAAATGTCC TTTAACAAAC TGGTTAAACA AACGGGTCCA TCCGCACGGT 7420 

GGACAGTTCC TCACAGTGAA GAGGAACATG CCGTTTATAA AGCCTGCAGG CATCTCAAGG GAATTACGCT 74 90 

GAGTCAAAAC TGCCACCTCC ATGGGATACG TACGCAACAT GCTCAAAAAG AAAGAATTTC ACCCCATGGC 75 60 

AGGGGAGTGG TTAGGGGGGT TAAGGACGGT GGGGGCGGCA GCTGGGGGCT ACTGCACGCA CCTTTTACTA 7 630 

AAGCCAGTTT CCTGGTTCTG ATGGTATTGG CTCAGTTATG GGAGACTAAC CATAGGGGAG TGGGGATGGG 7700 

GGAACCCGGA GGCTGTGCCA TCTTTGCCAT GCCCGAGTGT CCTGGGCAGG ATAATGCTCT AGAGATGCCC 7770 

ACGTCCTGAT TCCCCCAAAC CTGTGGACAG AACCCGCCCG GCCCCAGGGC CTTTGCAGGT GTGATCTCCG 78 40 

TGAGGACCCT GAGGTCTGGG ATCCTTCGGG ACTACCTGCA GGCCCGAAAA GTAATCCAGG GGTTCTGGGA 7910 

AGAGGCGGGC AGGAGGGTCA GAGGGGGGCA GCCTCAGGAC GATGGAGGCA GTCAGTCTGA GGCTGAAAAG 7 980 

GGAGGGAGGG CCTCGAGCCC AGGCCTGCAA GCGCCTCCAG AAGCTGGAAA AAGCGGGGAA GGGACCCTCC 8050 

ACGGAGCCTG CAGCAGGAAG GCACGGCTGG CCCTTAGCCC ACCAGGGCCC ATCGTGGACC TCCGGCCTCC 8120 

GTGCCATAGG AGGGCACTCG CGCTGCCCTT CTAGCATGAA GTGTGTGGGG ATTTGCAGAA GCAACAGGAA 8190 

ACCCATGCAC TGTGAATCTA GGATTATTTC AAAACAAAGG TTTACAGAAA CATCCAAGGA CAGGGCTGAA 82 60 

GTGCCTCCGG GCAAGGGCAG GGCAGGCACG AGTGATTTTA TTTAGCTATT TTATTTTATT TACTTACTTT 8330 

CTGAGACAGA GTTATGCTCT TGTTGCCCAG GCTGGAGTGC AGCGGCATGA TCTTGGCTCA CTGCAACCTC 8400 

CGTCTCCTGG GTTCAAGCAA TTCTCGTGCC TCAGCCTCCC AAGTAGCTGG GATTTCAGGC GTGCACCACC 8470 

ACACCCGGCT AATTTTGTAT TTTTAGTAGA GATGGGCTTT CACCATGTTG GTCAAGCTGA TCTCAAAATC 8540 

CTGACCTCAG GTGATCCGCC CACCTCAGCC TCCCAAAGTG CTGGGATTAC AGGCATGAGC CACTGCACCT 8610 

GGCCTATTTA ACCATTTTAA AACTTCCCTG GGCTCAAGTC ACACCCACTG GTAAGGAGTT CATGGAGTTC 8680 

AATTTCCCCT TTACTCAGGA GTTACCCTCC TTTGATATTT TCTGTAATTC TTCGTAGACT GGGGATACAC 8750 

CGTCTCTTGA CATATTCACA GTTTCTGTGA CCACCTGTTA TCCCATGGGA CCCACTGCAG GGGCAGCTGG 8820 

GAGGCTGCAG GCTTCAGGTC CCAGTGGGGT TGCCATCTGC CAGTAGAAAC CTGATGTAGA ATCAGGGCGC 88 90 

AAGTGTGGAC ACTGTCCTGA ATCTCAATGT CTCAGTGTGT GCTGAAACAT GTAGAAATTA AAGTCCATCC 89 60 

CTCCTACTCT ACTGGGATTG AGCCCCTTCC CTATCCCCCC CCAGGGGCAG AGGAGTTCCT CTCACTCCTG 9030 

TGGAGGAAGG AATGATACTT TGTTATTTTT CACTGCTGGT ACTGAATCCA CTGTTTCATT TGTTGGTTTG 9100 

TTTGTTTTGT TTTGAGAGGC GGTTTCACTC TTGTTGCTCA GGCTGGAGGG AGTGCAATGG CGCGATCTTG 9170 

GCTTACTGCA GCCTCTGCCT CCCAGGTTCA AGTGATTCTC CTGCTTCCGC CTCCCATTrC GCTGGGATTA 92 40 

CAGGCACCCG CCACCATGCC CAGCTAATTT TTTGTATTTT TAGTAGAGAC GGGGGTGGGT GGGGTTCACC 9310 

ATGTTGGCCA GGCTGGTCTC GAACTTCTGA CCTCAGATGA TCCACCTGCC TCTGCCTCCT AAAGTGCTGG 93 8 0 

GATTACAGGT GTGAGCCACC ATGCCCAGCT CAGAATTTAC TCTGTTTAGA AACATCTGGG TCTGAGGTAG 94 50 

GAAGCTCACC CCACTCAAGT GTTGTGGTGT TTTAAGCCAA TGATAGAATT TTTTTATTGT TGTTAGAACA 9520 

CTCTTGATGT TTTACACTGT GATGACTAAG ACATCATCAG CTTTTCAAAG ACACACTAAC TGCACCCATA 9590 

ATACTGGGGT GTCTTCTGGG TATCAGCAAT CTTCATTGAA TGCCGGGAGG CGTTTCCTCG CCATGCACAT 9 650 

GGTGTTAATT ACTCCAGCAT AATCTTCTGC TTCCATTTCT TCTCTTCCCT CTTTTAAAAT TGTGTTTTCT 9730 

ATGTTGGCTT CTCTGCAGAG AACCAGTGTA AGCTACAACT TAACTTTTGT TGGAACAAAT TTTCCAAACC 9800 

GCCCCTTTGC CCTAGTGGCA GAGACAATTC ACAAACACAG CCCTTTAAAA AGGCTTAGGG ATCACTAAGG 9870 

GGATTTCTAG AAGAGCGACC TGTAATCCTA AGTATTTACA AGACGAGGCT AACCTCCAGC GAGCGTGACA 9 94 0 

GCCCAGGGAG GGTGCGAGGC CTGTTCAAAT GCTAGCTCCA T7WVTAAAGC AATTTCCTCC GGCAGTTTCT 10010 

GAAAGTAGGA AAGGTTACAT TTAAGGTTGC GTTTGTTAGC ATTTCAGTGT TTGCCGACC7 CAGCTACAGC 1008 0 

ATCCCTGCAA GGCCTCGGGA GACCCAGAAG TTTCTCGCCC CCTTAGATCC AAACTTGAGC AACCCGGAGT 10150 

CTGGATTCCT GGGAAGTCCT CAGCTGTCCT GCGGTTGTGC CGGGGCCCCA GGTCTGGAGG GGACCAGTGG 1022 0 

CCGTGTGGCT TCTACTGCTG GGCTGGA-i^GT CGGGCCTCCT AGCTCTGCAG TCCGAGGCTT GGAGCCAGGT 10290 

GCCTGGACCC CGAGGCTGCC CTCCACCCTG TGCGGGCGGG ATGTGACCAG ATGTTGGCCT CATCTGCCAG 10360 

ACAGAGTGCC GGGGCCCAGG GTCAAGGCCG TTGTGGCTGG TGTGAGGCGC CCGGTGCGCG GCCAGCAGGA 10430 

GCGCCTGGCT CCATTTCCCA CCCTTTCTCG ACGGGACCGC CCCGGTGGGT GATTAACAGA TTTGGGGTGG 10500 

TTTGCTCATG GTGGGGACCC CTCGCCGCCT GAGAACCTGC .W^GAGAAAT GACGGGCCTG TGTCAAGGAG 10 570 
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CCCAAGTCGC GGGGAAGTGT TGCAGGGAGG CACTCCGGGA GGTCCCGCGT GCCCGTCCAG GGAGCAATGC 1064 0 
GTCCTCGGGT TCGTCCCCAG CCGCGTCTAC GCGCCTCCGT CCTCCCCTTC ACGTCCGGCA TTCGTGGTGC 10710 
CCGGAGCCCG ACGCCCCGCG TCCGGACCTG GAGGCAGCCC TGGGTCTCCG GATCAGGCCA GCGGCCAAAG 10780 
GGTCGCCGCA CGCACCTGTT CCCAGGGCCT CCACATCATG GCGCCTCCGT CGGGTTACCC CACAGCCTAG 10850 
GCCGATTCGA CCTCTCTCCG CTGGGGCCCT CGCTGGCGTC CCTGCACCCT GGGAGCGCGA GCGGCGCGCG 10920 
GGCGGGGAAG CGCGGCCCAG ACCCCCGGGT CCGCCCGGAG CAGCTGCGCT GTCGGGGCCA GGCCGGGCTC 10990 
CCAGTGGATT CGCGGGCACA GACGCCCAGG ACCGCGCTCC CCACGTGGCG GAGGGACTGG GGACCCGGGC 11060 
ACCCGTCCTG CCCCTTCACC TTCCAGCTCC GCCTCCTCCG CGCGGACCCC GCCCCGTCCC GACCCCTCCC 11130 
GGGTCCCCGG CCCAGCCCCC TCCGGGCCCT CCCAGCCCCT CCCCTTCCTT TCCGCGGCCC CGCCCTCTCC 11200 
TCGCGGCGCG AGTTTCAGGC AGCGCTGCGT CCTGCTGCGC ACGTGGGAAG CCCTGGCCCC GGCCACCCCC 1127 0 
GCGATGCCGC GCGCTCCCCG CTGCCGAGCC GTGCGCTCCC TGCTGCGCAG CCACTACCGC GAGGTGCTGC 1134 0 
CGCTGGCCAC GTTCGTGCGG CGCCTGGGGC CCCAGGGCTG GCGGCTGGTG CAGCGCGGGG ACCCGGCGGC 11410 
TTTCCGCGCG CTGGTGGCCC AGTGCCTGGT GTGCGTGCCC TGGGACGCAC GGCCGCCCCC CGCCGCCCCC 1148 0 
TCCTTCCGCC AGGTGGGCCT CCCCGGGGTC GGCGTCCGGC TGGGGTTGAG GGCGGCCGGG GGGAACCAGC 11550 
GACATGCGGA GAGCAGCGCA GGCGACTCAG GGCGCTTCCC CCGCAGGTGT CCTGCCTGAA GGAGCTGGTG 11620 
GCCCGAGTGC TGCAGAGGCT GTGCGAGCGC GGCGCGAAGA ACGTGCTGGC CTTCGGCTTC GCGCTGCTGG 11690 
ACGGGGCCCG CGGGGGCCCC CCCGAGGCCT TCACCACCAG CGTGCGCAGC TACCTGCCCA ACACGGTGAC 11760 
CGACGCACTG CGGGGGAGCG GGGCGTGGGG GCTGCTGCTG CGCCGCGTGG GCGACGACGT GCTGGTTCAC 11830 
CTGCTGGCAC GCTGCGCGCT CTTTGTGCTG GTGGCTCCCA GCTGCGCCTA CCAGGTGTGC GGGCCGCCGC 11900 
TGTACCAGCT CGGCGCTGCC ACTCAGGCCC GGCCCCCGCC ACACGCTAGT GGACCCCGAA GGCGTCTGGG 11970 
ATGCGAACGG GCCTGGAACC ATAGCGTCAG GGAGGCCGGG GTCCCCCTGG GCCTGCCAGC CCCGGGTGCG 1204 0 
AGGAGGCGCG GGGGCAGTGC CAGCCGAAGT CTGCCGTTGC CCAAGAGGCC CAGGCGTGGC GCTGCCCCTG 12110 
AGCCGGAGCG GACGCCCGTT GGGCAGGGGT CCTGGGCCCA CCCGGGCAGG ACGCGTGGAC CGAGTGACCG 12180 
TGGTTTCTGT GTGGTGTCAC CTGCCAGACC- CGCCGAAGAA GCCACCTCTT TGGAGGGTGC GCTCTCTGGC 12250 
ACGCGCCACT CCCACCCATC CGTGGGCCGC CAGCACCACG CAGGCCCCCC ATCCACATCG CGGCCACCAC 1232 0 
GTCCCTGGGA CACGCCTTGT CCCCCGGTGT ACGCCGAGAC CAAGCACTTC CTCTACTCCT CAGGCGACAA 12390 
GGAGCAGCTG CGGCCCTCCT TCCTACTCAG CTCTCTGAGG CCCAGCCTGA CTGGCGCTCG GAGGCTCGTG 124 60 
GAGACCATCT TTCTGGGTTC CAGGCCCTGG ATGCCAGGGA CTCCCCGCAG GTTGCCCCGC CTGCCCCAGC 12530 
GCTACTGGCA AATGCGGCCC CTGTTTCTGG AGCTGCTTGG GAACCACGCG CAGTGCCCCT ACGGGGTGCT 12600 
CCTCAAGACG CACTGCCCGC TGCGAGCTGC GGTCACCCCA GCAGCCGGTG TCTGTGCCCG GGAGAAGCCC 12670 
CAGGGCTCTG TGGCGGCCCC CGAGGAGGAG GACACAGACC CCCGTCGCCT GGTGCAGCTG CTCCGCCAGC 1274 0 
ACAGCAGCCC CTGGCAGGTG TACGGCTTCG TGCGGGCCTG CCTGCGCCGG CTGGTGGCCC CAGGGCTCTG 12810 
GGGCTCCAGG CACAACGAAC GCCGCTTCCT CAGGAACACC AAGAAGTTCA TCTCCCTGGG GAAGCATGCC 12880 
AAGCTCTCGC TGCAGGAGCT GACGTGGAAG ATGAGCGTGC GGGACTGCGC TTGGCTGCGC AGGAGCCCAG 12950 
GTGAGGAGGT GGTGGCCGTC GAGGGCCCAG GCCCCAGAGC TGAATGCAGT AGGGGCTCAG AAAAGGGGGC 13020 
AGGCAGAGCC CTGGTCCTCC TGTCTCCATC GTCACGTGGG CACACGTGGC TTTTCGCTCA GGACGTCGAG 13090 
TGGACACGGT GATCTCTGCC TCTGCTCTCC CTCCTGTCCA GTTTGCATAA ACTTACGAGG TTCACCTTCA 13160 
CGTTTTGATG GACACGCGGT TTCCAGGCGC CGAGGCCAGA GCAGTGAACA GAGGAGGCTG GGCGCGGCAG 13230 
TGGAGCCGGG TTGCCGGCAA TGGGGAGAAG TGTCTGGAAG CACAGACGCT CTGGCGAGGG TGCCTGCAGG 13300 
TTACCTATAA TCCTCTTCGC AATTTCAAGG GTGGGAATGA GAGGTGGGGA CGAGAACCCC CTCTTCCTGG 13370 
GGGTGGGAGG TAAGGGTTTT GCAGGTGCAC GTGGTCAGCC AATATGCAGG TTTGTGTTTA AGATTTAATT 13440 
GTGTGTTGAC GGCCAGGTGC GGTGGCTCAC GCCGGTAATC CCAGCACTTT GGGAAGCTGA GGCAGGTGGA 13510 
TCACCTGAGG TCAGGAGTTT GAGACCAGCC TGACCAACAT GGTGAAACCC TATCTGTACT AAAAATACAA 1358 0 
AAATTAGCTG GGCATGGTGG TGTGTGCCTG TAATCCCAGC TACTTGGGAG GCTGAGGCAG GAGAATCACT 13650 
TGAACCCAGG AGGCGGAGGC TGCAGTGAGC TGAGATTGTG CCATTGTACT CCAGCCTGGG CGACAAGAGT 13720 
GAAACTCTGT CTTTAAAAAA AAAAAGTGTT CGTTGATTGT GCCAGGACAG GGTAGAGGGA GGGAGATAAG 13790 
ACTGTTCTCC AGCACAGATC CTGGTCCCAT CTTTAGGTAT GAAGAGGGCC ACATGGGAGC AGAGGACAGC 13860 
AGATGGCTCC ACCTGCTGAG GAAGGGACAG TGTTTGTGGG TGTTCAGGGG ATGGTGCTGC TGGGCCCTGC 13930 
CGTGTCCCCA CCCTGTTTTT CTGGATTTGA TGTTGAGGAA CCTCCGCTCC AGCCCCCTTT TGGCTCCCAG 14000 
TGCTCCCAGG CCCTACCGTG GCAGCTAGAA GAAGTCCCGA TTTCACCCCC TCCCCACAAA CTCCCAAGAC 14070 
ATGTAAGACT TCCGGCCATG CAGACAAGGA GGGTGACCTT CTTGGGGCTC TTTTTTTTCT TTTTTTCTTT 14140 
TTATGGTGGC AAAAGTCATA TAACATGAGA TTGGCACTCC TAACACCGTT TTCTGTGTAC AGTGCAGAAT 14210 
TGCTAACTCG GCGGTGTTTA CAGCAGGTTG CTTGAAATGC TGCGTCTTGC GTGACTGGAA GTCCCTACCC 14280 
ATCGAJ\CGGC AGCTGCCTCA CACCTGCTGC GGCTCAGGTG GACCACGCCG AGTCAGATAA GCGTCATGCA 14350 
ACCCAGTTTT GCTTTTTGTG CTCCAGCTTC CTTCGTTGAG GAGAGTTTGA GTTCTCTGAT CAGGACTCTG 14420 
CCTGTCATTG CTGTTCTCTG ACTTCAGATG AGGTCACAAT CTGCCCCTGG CTTATGCAGG GAGTGAGGCG 144 90 
TGGTCCCCGG GTGTCCCTGT CACGTGCAGG GTGAGTGAGG CGTTGCCCCC AGGTGTCCCT GTCACGTGTA 14560 
GGGTGAGTGA GGCGCGGCCC CCGGGTGTCC CTGTCCCGTG CAGCGTGATT GAGGTGTGGC CCCCGGGTGT 14630 
CCCTGTCACG TGTAGGGTGA GTGAGGCGCC ATCCCCGGGT GTCCCTGTCA CGTGTAGGGT GAGTGAGGCG 14700 
TGGTCCCCGG GTGTCCCTGT CCCGTGCAGG GTGAGTGAGG CACTGTCCCC GGGTGTCCCT GTCACGTGCA 14770 
GGGTGAGTGA GGCGCGGTCC CCGGGTGTCC CTCTCAGGTG TAGGGTGAGT GAGGCGCGGC CCCAGGGTGT 1484 0 
CCCTGTCACG TGTAGGGTGA GTGAG3CACC GTCCCTGGGT GTCCCTCCCA GGTATAGGGT GAGTGAGGCA 14 910 
CTGTCCCCGG GTGTCCCTGT CACGTGCAGG GTGAGTGAGG CGCGGCCCCC GGGTGTCCCT CTCAGGTGCA 14980 
GGGTGAGTGA GGCGCTGTCC CTGGGTGTCC CTGTCTCGTG TAGGGTGAGT GAGGCTCTGT CCGCAGGTGT 15050 
CCTTGGCGTT TGCTCACTTG AGCTTGCTCC TGAATGTTTG CTCTTTCTAT AGCCACAGCT GCGCCGGTTG 15120 
CCCATTGCCT GGGTAGATGG TGCAGGCGCA GTGCTGGTCC CCAAGCCTAT CTTTTCTGAT GCTCGGCTCT 15190 
TCTTGGTCAC CTCTCCGTTC CATTTTGCTA CGGG3ACACG GGACTGCAGG CTCTCGCCTC CCGCGTGCCA 15260 
GGCACTGCAG CCACAGCTTC AGGTCCGCTT GCCTCTGTTG GGCCTGGCTT GCTCACCACG TGCCCGCCAC 15330 
ATGCATGCTG CCAATACTCC TCTCCCAGCT TGTCTCATGC CGAGGCTGGA CTCTGGGCTG CCTGTGTCTG 15400 
CTGCCACGTG TTGCTGGAGA CATCCCAGAA AGGGTTCTCT GTGCCCTGAA GGAAAGCAAG TCACCCCAGC 1547 0 
CCCCTCACTT GTCCTGTTTT CTCCCAAGCT GCCCCTCTGC TTGGCCCCCT TGGGTGGGTG GCAACGCTTG 15540 
TCACCTTATT CTGGGCACCT GCC3CTCATT GCTT-.GGCTG GGCTCTGCCT CCAGTCGCCC CCTCACATGG 15610 
ATT3^CGTCC AGCCACAGGT TGGAG7GTCT CTGTCTGTCT CCTGCTCTGA GACCCACGTG GAGGGCCGGT 15680 
GTCTCCGCCA GCCTTCGTCA GACTTCCCTC TTGGGTCTTA GTTTTGAATT TCACTGATTT ACCTCTGACG 15750 
TTTCTATCTC TCCATTGTAT GCTTTTTCTT GGTTTATTCT TTCATTCCTT TTCTAGCTTC TTAGTTTAGT 15820 
CATGCCTTTC CCTCTAASTG CTGCCTTACC TGCACCCTGT GTTTTGATGT GAAGTAATCT CAACATCAGC 15890 
CACTTTCAAG TGTTCTTAAA -.TACTTCA.aA GTGTTAATAC TTCTTTTAAG TATTCTTATT CTGTGATTTT 15950 
TTTCTTTGTG CACGCTGTGT 7TTGA;GTGA ^ATC-.TTTTG ATATCAGTGA CTTTTAAGTA TTCTTTAGCT 15030 
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TATTCTGTGA TTTCTTTGAG CAGTGAGTTA TTTGAACACT 

ATACGTAGAG TATTTTAAGT TATCATTTTA TTATTGATTT 

ACCAATTATT TGAAGTTTGC GGAGCCTTGC TTTGTGATCT 

TGTAAATTTG ACATCCTGTC AATAGTGGGC ATGCATGTTC 

AAGCTTCTGT CTCCTTCTAG ATGCATGAAA TTCCAAGAAG 

TCTGTTCATT TCTTCTCGTT TGGTAGCATT TATGTGAGGC 

TTATCTTCCT GATGAGTGAA TCTTTTGGAG ACTTCTATGT 

TTGCTCTTAG TACTGCCACA CTGGGCTTCT TTTGATTAGT 

AATTTATATA TATATATATA ttTTTTTTTT TTTTGAGACA 

AGTGGTGTGA TCACAGGTCA GTGTAACTTT TACCTTCTGG 

AGTAGCTGGA ACTGCAGACA CGCACCGCTA CACCTGGCTA 

tgctgtgttg cccaggctgg tctcaaactc ttggactcaa 
ctgaattaca ggcatgagcc accatgtctg gcctaatttt 

TGTCCTGTTA ACAGCATGTA GGTGAATTTC CAATCCAGTC 

TTTATTTTCA TTTTTTTGTC ACTAGAGACC CGCCTGGTGC 

CCTCGTTCCC TTGTTTCTCA CCACCTCTTG GGTTGCCATG 

CTCGTTGCCT CCTGGTCACT GGGCATTTGC TTTTATTTCT 

ATTGTCGTTG TTTGCTTTTG TTTATTGAGA CAGTCTCACT 

ATCTCGGCTC ACTGCAACCT CTGCCTCCTC GGTTCAAGCA 

GGATTACAGG CGCCCACCAC CACGCCTGGC TAATTTTTGT 

TGGCCAGGCT GGTCTCAAAC TCCTGACCTC AAGTGATCTG 

ACAGGTGCAA GCCACCGTGC CCGGCATACC TTGATCTTTT 

TCCTGAGCAA TAAGACCCTT AGTGTATTTT AGCTCTGGCC 

TGACTTAGTT CTATCTCAGG CATCTTGACA' CCCCCACAAG 

GAGTGTTTCT GTAGCTTTGC CCCCGCCCTG CTTTTCCTCC 

CCGCCGTCTG GGGTCCCCTT CCTTGTCCTT TGCGTGGTTC 

CTTTACCTGT GCTGGCCTCC ATGGCATCTA GCGACGTCCG 

ATGTGGAGAC TCACGAGGAG GGCGGTCATC TTGGCCCGTG 

CTTAGCCAGT GAGTGACAGC AACGTCCGCT CGGCCTGGGT 

TCTGGTGGCT CCGCGGTGTC GAGTTTGAAA TCGCGCAAAC 

GCCTGGCGGG GGAGTGTCTG CTTCCTCCCT TCTGCTTGGG 

TTGTCGCCCA ACAGGAGCAT GACGTGAGCC ATGTGGATAA 

TCACGCCTGT AATCCCAGCA CTTTGGGAGG CCAAGGCGGG 

CCTGGCCAAC ATGATGAAAC CCCATCTGTA CTAAAAACAC 

TGTAATCCCA GCTACTCGGG AGGCTGAGGC AGGAGAATTG 

GCCGACATTG CACCACTGCA CTCCAGCCTG GCAACACAGC 

AAAAAAAAAA AATTCTAGTA GCCACATTAA AAAAGTAAAA 

TTTTACTGAA GCCCAGCATG TCCACACCTC ATCATTTTAG 

ACATTTGACA TTTTTTGAGC TTTGTCTGCG GGATCCCGTG 

GGACCTGCTG GGCTTCCCAT GGCCATGGCT GTTGTACCAG 

CCCTCAGTGA GCTGGATGTG CAGTGTCCGG ATGGTGCACG 

AGCTGGATGT GTGGTGTCTG GATGGTGCAG GTCAGGGGTG 

ATGGAGTCCG GATGATGCAG GTCCGGGGTG AGGTCGCCAG 

GGATGGTGCA GGTCAGGGGT GAGGTCTCCA GGCCCTCGGT 

GGTCCGGGGT GAGGTCGCCA GGCCCTGCTG TGAGCTGGAT 

TGAGGTCACC AGGCCCTGCG GTGAGCTGGG TGTGCGGTGT 

CAGACGGTGC CAGACCATGC GGTGAGCTGG ATATGCGGTG 

CCAGGCCCTG CTGTGAGTTG GATGTGGGGT GTCCGGATGC 

GCTGTGAGCT GGATGTGTGG TGTCTGGATG GTGCAGGTCT 

AGCTGGATGT GTGGTGTCTG GATGGTGCAG GTCTGGAGTG 

GCAGTGTCCA GATGGTGCAG GTCCGGGGTG AGGTCGCCAG 

GGATGGTGCA GGTCTGGAGT GAGGTCGCCA GGCCCTCGGT 

GGTCCGGGGT GAGGTCGCCA GACCCTGCTG TGAGCTGGAT 

TGAGGTCGCC AGACCCTGCT GTGAGCTGGA TATGCGGTGT 

CAGGCCCTCG GTGAGCTGGA GGTATGGAGT CCGGATGATG 

TGTGAACTGG ATGTGCGGCG TCTGGATGGT GCAGGTCTGG 

AGGTATGGAG TCCGGATGAT GCAGGTCCGG GGTGAGGTCG 

GTCTGGATGG TGCAGGTCTG GGGTGTGGTC GCCAGGCCCT 

TGCAGGTCCG GGGTGAGGTT GCCAGGCCCT GCTGTGAGCT 

GGGTGAGGTC GCCAGGCCCT GCTGTGAGCT GGATGTGCTG 

CACCAGGCCC TGCGGTGAGC TGGTTGTGCG GTGTCCGGTT 

CTCGGTGAGC TGGATGTGCG GTGTCCCCGT GTCCGGATGG 

TGGTGGGCTG GATGTGGGGT GTCCGGATGG TGCAGGTCTG 

GATGTGGGGT GTCTGCATGG TGCAGGTCTG GGGTGAGGTC 

GTCCGGATGG TGCAGGTCCG GGGTGAGGTC GCCAGGCCCT 

GTGCAGGTCC GGGGTGAGGT AGCCAAGGCC TTCGGTGAGC 

CGGGGTGAGG TCGCCAGGCC CTGCGGTTAG CTGGATATGC 

GTCACCAGGC CCTGCGGTTA GCTGGATGTG CGGTGTCTGG 

CCCTGCTGTG AGCTGGATGT GCTGTATCCG GATGGTGCAG 

GAGCTGGATG TGCTGTATCC GGATGGTGCA GGTCTGGCGT 

ATGCGGTGTC GGATGGTGCA GGTCCGGGGT GAGGTCACCA 

CGGATGGTGC AGGTCT3GGG TGAGGTCGCC AGGCCCTGCT 

CAGGTCCGGG GTGAGGTCGC CAGGCCCTGC GGTGAGCTGG 

CGTGAGGTCG CCAGGCCCTG CGGTGAGCTG GATGTGCAGT 

GCCAGGCCCT GCGGTGGGCT GTATGTGTGT TGTCTGGATG 

TGCGGTGAGC TGGATGTGTG GTGTCTGGAT GCTGCAGGTC 

TGGATATGCG GTGTCCCCGT GTCCGAATGG TGCAGGTCCA 

GATGTGCCGT GTCCGGATGG TGCAGGTCTG GGGTGAGGTC 



GTTTATGTTC AAGATATGTA GAGTRTCAAG 16100 

CTAACTCAGT TGTGTAGTGG TCTGTATAAT 16170 

AGTGTGTGCA TGGTTTCCAG AACTGTCCAT 16240 

ACTATATCCA GCTTATTAAG GTCCAGTGCA 16310 

GAGGCCATAG TCCCTCACCT GGGGGATGGG 16380 

ATTGTTAGGT GCATGCACGT GGTAGAATTT 16450 

CTCTAGTAAT CTAGTAATTC TTTTTTTAAA 1652 0 

ATTTTCCTGC TGTGTCTGTT TTCTGCCTTT 16590 

GAGTCTTGGT CTGTCGCCCA GGGTGAGTGC 16660 

CCTGAGCCGT CCTCTCACCT CAGCCTCCTG 16730 

^^^^^^j^^^ TTTTTCTGGA GACAGGGTCT 16800 

GGGATCCATC TACCTCGGCT TCCCAAAGTG 16870 

CAACACTTTT ATATTCTTAT AGTGTGGGTA 1694 0 

TGACAGTCGT TGTTTAACTG GATAACCTGA 17010 

ACTCTGATTC TCCACTTGCC TGTTGCATGT 17080 

TGCGTTTCCT GCCGAGTGTG TGTTGATCCT 17150 

CTTTGCTTAG TGTTACCCCC TGATCTTTTT 17220 

CTGTCACCCA GGCTGGAGTG TAATGGCACA 17290 

GTTCTCATTC CTCAACCTCA TGAGTAGCTG 17360 

ATTTTTAGTA GAGATAGGCT TTCACCATGT 17430 

CCCGCCTTGG CCTCCCACAG TGCTGGGATT 17500 

AAAATGAAGT CTGAAACATT GCTACCCTTG 17570 

ACCCCCCAGC CTGTGTGCTG TTTTCCCTGC 17 540 

CTAAGCATTA TTAATATTGT TTTCCGTGTT 17710 

TTTGTTCCCC GTCTGTCTTC TGTCTCAGGC 17780 

TTCTGTCTTG TTATTGCTGG TAAACCCCAG 17 850 

GGGACCTCTG CTTATGATGC ACAGATGAAG 17920 

AGTGTCTGGA GCACCACGTG GCCAGCGTTC 17 990 

TCAGCCTGGA AAACCCCAGG CATGTCGGGG 18060 

CTGCGGTGTG GCGCCAGCTC TGACGGTGCT 18130 

AACCAGGACA AAGGATGAGG CTCCGAGCCG 18200 

TTTTAAAATT TCTAGGCTGG GCGCGGTGGC 18270 

TGGATCACGA GGTCAGGAGG TCGAGACCAT 18340 

AAAAATTAGC TGGGCGTGGT GGCGGGTGCC 18410 

CTTGAACCTG GGAGTTGGAA GTTGCAGTGA 18480 

GAGACTCTGT CTCAAAAAAA AAAAAAAAAA 18 550 

AAGAAAAGGT GAAATTAATG TAATAATAGA 18620 

GGTGTTATTG GTGGGAGCAT CACTCACAGG 18690 

TGTAGGTCCC GTGCGTGGCC ATCTCGGCCT 18760 

ATGGTGCAGG TCCGGGATGA GGTCGCCAGG 18830 

TCTGGGATGA GGTCGCCAGG CCCTGCTGTG 18 900 

AGGTCTCCAG GCCCTCGGTG AGCTGGAGGT 18 970 

GCCCTGCTGT GAGCTGGATG TGTGGTGTCT 19040 

AAGCTGGAGG TATGGAGTCC GGATGATGCA 19110 

GTGTGGTGTC TGGATGGTGC AGGTCTGGGG 19180 

CTGGATGGTG CAGGTCTGGA GTGAGGTCGC 19250 

TCCGGATGGT GCAGGTCTGG GGTGAGGTTG 19320 

TGCAGGTCCG GTGTGAGGTC ACCAGGCCCT 19390 

GGGGTGAAGG TCGCCAGGCC CCTGCTTGTG 194 60 

AGGTCGCCAG GCCCTCGGTG AGCTGGATGT 19530 

ACCCTGCGGT GAGCTGGATG TGCGGTGTCT 19600 

GAGCTGGATG TATGGAGTCC GGATGGTGCC 1967 0 

GTGCGGTGTC TGGATGGTAC AGGTCTGGAG 19740 

CCGGATGGTG CAGGTCAGGG GTGAGGTCTC 19810 

CAGGTCCGGG GTGAGGTCGC CAGGCCCTGC 19880 

GGTGTGGTCG CCAGGCCCTG GGTGAGCTGG 19950 

CCAGGCCCTG CTGTGAGCTG GATGTGCGGC 20020 

CGGTGAGCTG GAGGTATGGA GTCCGGATGA 20090 

GGATGTGCTG TATCCGGATG GTGCAGTCCG 20160 

TATCCGGATG GTGCAGGTCT GGGGTGAGGT 20230 

GCTGCAGGTC CGGGGTGAGT TCGCCAGGCC 20300 

TGCAGGTCCA GGGTGAGGTC GCTAGGCCCT 20370 

GGGTGAGGTC GCCAGGCCTT TGGTGAGCTG 2 044 0 

GCCAGGCCCT TGGTGGGCTG GATGTGTGGT 20510 

GCTGTGAGCT GGATGTGCGG TGTCTGGATG 2 058 0 

TGGATGTGGG GTGTCCGGAT GCTGCAGGTC 2 0650 

GGTGTCCGGA TGGTGCAGGT CCGGGGTGAG 2 0720 

ATGGTGCAGG TCCGGGGTGA GGTCGCCAGG 20790 

GTCCGGGGTG AGGTCGCCAG GCCCTGCAGT 20860 

GAGGTCGCCA GGCCCTGCGG TTAGCTGGAT 20930 

GGCCCTGCGG TTAGCTGGAT GTGCGGTGTC 21000 

GTGAGCTGGA TGTGCTGTAT CCGGATGGTG 21070 

ATGTGCTGTA TCCGGATGGT GCAGGTCTGG 21140 

GTACGGATGG TGCAGGTCCG GGGTGAGGTC 21210 

GTGCAGGTCC GGGGrGAGTT CGCCAGGCCC 21280 

CGGGGTGAGT TCGCCAGGCC CTCGGTGAGC 21350 

GGGTGAGGTC GCC-.3GCCCT TGGTGGGCTG 21420 

GCCAGGCCCT TGGTGAGCTG GATGTGGGGT 21490 
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GTCCGGATGG TGCAGGTCCG GGGTGAGGTC ACCAGGCCCT CGGTGATCTG GATGTGGCAT GTCCTTCTCG 21560 

TTTAAGGGGT TGGCTGTGTT CCGGCCGCAG AGCACCGTCT GCGTGAGGAG ATCCTGGCCA AGTTCCTGCA 21630 

CTGGCTGATG AGTGTGTACG TCGTCGAGCT GCTCAGGTCT TTCTTTTATG TCACGGAGAC CACGTTTCAA 21700 

AAGAACAGGC TCTTTTTCTA CCGGAAGAGT GTCTGGAGCA AGTTGCAAAG CATTGGAATC AGGTACTGTA 21770 

TCCCCACGCC AGGCCTCTGC TTCTCGAAGT CCTGGAACAC CAGCCCGGCC TCAGCATGCG CCTGTCTCCA 21840 

CTTGCCTGTG CTTCCCTGGC TGTGCAGCTC TGGGCTGGGA GCCAGGGGCC CCGTCACAGG CCTGGTCCAA 21910 

GTGGATTCTG TGCAAGGCTC TGACTGCCTG GAGCTCACGT TCTCTTACTT GTAAAATCAG GAGTTTGTGC 21980 

CAAGTGGTCT CTAGGGTTTG TAAAGCAGAA GGGATTTAAA TTAGATGGAA ACACTACCAC TAGCCTCCTT 22050 

GCCTTTCCCT GGGATGTGGG TCTGATTCTC TCTCTCTTTT TTTTTTCTTT TTTGAGATGG AGTCTCACTC 22120 

TGTTGCCCAG GCTGGAGTGC AGTGGCATAA TCTTGGCTCA CTGCAACCTC CACCTCCTGG GTTTAAGCGA 22190 

TTCACCAGCC TCAGCCTCCT AAGTAGCTGG GATTACAGGC ACCTGCCACC ACGCCTGGCT AATTTTTGTA 222 60 

CTTTTAGGAG AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCGAACT CATGACCTCA GGTGATCCAC 22330 

CCACCTTGGC CTCCCAAAGT GCTGGGTTTA CAGGCTAAGC CACCGTGCCC AGCCCCCGAT TCTCTTTTAA 224 00 

TTCATGCTGT TCTGTATGAA TCTTCAATCT ATTGGATTTA GGTCATGAGA GGATAAAATC CCACCCACTT 22470 

GGCGACTCAC TGCAGGGAGC ACCTGTGCAG GGAGCACCTG GGGATAGGAG AGTTCCACCA TGAGCTAACT 22540 

TCTAGGTGGC TGCATTTGAA TGGCTGTGAG ATTTTGTCTG CAATGTTCGG CTGATGAGAG TGTGAGATTG 22610 

TGACAGATTC AAGCTGGATT TGCATCAGTG AGGGACGGGA GCGCTGGTCT GGGAGATGCC AGCCTGGCTG 22680 

AGCCCAGGCC ATGGTATTAG CTTCTCCGTG TCCCGCCCAG GCTGACTGTG GAGGGCTTTA GTCAGAAGAT 227 50 

CAGGGCTTCC CCAGCTCCCC TGCACACTCG AGTCCCTGGG GGGCCTTGTG ACACCCCATG CCCCAAATCA 22820 

GGATGTCTGC AGAGGGAGCT GGCAGCAGAC CTCGTCAGAG GTAACACAGC CTCTGGGCTG GGGACCCCGA 228 90 

CGTGGTGCTG GGGCCATTTC CTTGCATCTG GGGGAGGGTC AGGGCTTTCC CTGTGGGAAC AAGTTAATAC 22960 

ACAATGCACC TTACTTAGAC TTTACACGTA TTTAATGGTG TGCGACCCAA CATGGTCATT TGACCAGTAT 23030 

TTTGGAAAGA ATTTAATTGG GGTGACCGGA AGGAGCAGAC AGACGTGGTG GTCCCCAAGA TGCTCCTTGT 23100 

CACTACTGGG ACTGTTGTTC TGCCTGGGGG GCCTTGGAGG CCCCTCCTCC CTGGACAGGG TACCGTGCCT 23170 

TTTCTACTCT GCTGGGCCTG CGGCCTGCGG TCAGGGCACC AGCTCCGGAG CACCCGCGGC CCCAGTGTCC 23240 

ACGGAGTGCC AGGCTGTCAG CCACAGATGC CCAGGTCCAG GTGTGGCCGC TCCAGCCCCC GTGCCCCCAT 23310 

GGGTGGTTTT GGGGGAAAAG GCCAAGGGCA GAGGTGTCAG GAGACTGGTG GGCTCATGAG AGCTGATTCT 23380 

GCTCCTTGGC TGAGCTGCCC TGAGCAGCCT CTCCCGCCCT CTCCATCTGA AGGGATGTGG CTCTTTCTAC 23450 

CTGGGGGTCC TGCCTGGGGG CAGCCTTGGG CTACCCCAGT GGCTGTACCA GAGGGACAGG CATCCTGTGT 23520 

GGAGGGGCAT GGGTTCACGT GGCCCCAGAT GCAGCCTGGG ACCAGGCTCC CTGGTGCTGA TGGTGGGACA 23590 

GTCACCCTGG GGGTTGACCG CCGGACTGGG CGTCCCCAGG GTTGACTATA GGACCAGGTG TCCAGGTGCC 23660 

CTGCAAGTAG AGGGGCTCTC AGAGGCGTCT GGCTGGCATG GGTGGACGTG GCCCCGGGCA TGGCCTTCAG 237 30 

CGTGTGCTGC CGTGGGTGCC CTGAGCCCTC ACTGAGTCGG TGGGGGCTTG TGGCTTCCCG TGAGCTTCCC 23800 

CCTAGTCTGT TGTCTGGCTG AGCAAGCCTC CTGAGGGGCT CTCTATTGCA GACAGCACTT GAAGAGGGTG 23870 

CAGCTGCGGG AGCTGTCGGA AGCAGAGGTC AGGCAGCATC GGGAAGCCAG GCCCGCCCTG CTGACGTCCA 23940 

GACTCCGCTT CATCCCCAAG CCTGACGGGC TGCGGCCGAT TGTGAACATG GACTACGTCG TGGGAGCCAG 24010 

AACGTTCCGC AGAGAAAAGA GGGTGGCTGT GCTTTGGTTT AACTTCCTTT TTAAACAGAA GTGCGTTTGA 24030 

GCCCCACATT TGGTATCAGC TTAGATGAAG GGCCCGGAGG AGGGGCCACG GGACACAGCC AGGGCCATGG 24150 

CACGGCGCCA ACCCATTTGT GCGCACAGTG AGGTGGCCGA GGTGCCGGTG CCTCCAGAAA AGCAGCGTGG 24220 

GGGTGTAGGG GGAGCTCCTG GGGCAGGGAC AGGCTCTGAG GACCACAAGA AGCAGCCGGG CCAGGGCCTG 24290 

GATGCAGCAC GGCCCGAGGT CCTGGATCCG TGTCCTGCTG TGGTGCGCAG CCTCCGTGCG CTTCCGCTTA 243 60 

CGGGGCCCGG GGACCAGGCC ACGACTGCCA GGAGCCCACC GGGCTCTGAG GATCCTGGAC CTTGCCCCAC 2 44 30 

GGCTCCTGCA CCCCACCCCT GTGGCTGCGG TGGCTGCGGT GACCCCGTCA TCTGAGGAGA GTGTGGGGTG 24500 

AGGTGGACAG AGGTGTGGCA TGAGGATCCC GTGTGCAACA CACATGCGGC CAGGAACCCG TTTCAAACAG 24570 

GGTCTGAGGA AGCTGGGAGG GGTTCTAGGT CCCGGGTCTG GGTGGCTGGG GACACTGGGG AGGGGCTGCT 24 640 

TCTCCCCTGG GTCCCTATGG TGGGGTGGGC ACTTGGCCGG ATCCACTTTC CTGACTGTCT CCCATGCTGT 24710 

CCCCGCCAGG CCGAGCGTCT CACCTCGAGG GTGAAGGCAC TGTTCAGCGT GCTCAACTAC GAGCGGGCGC 24780 

GGCGCCCCGG CCTCCTGGGC GCCTCTGTGC TGGGCCTGGA CGATATCCAC AGGGCCTGGC GCACCTTCGT 24850 

GCTGCGTGTG CGGGCCCAGG ACCCGCCGCC TGAGCTGTAC TTTGTCAAGG TGGGTGCCGG GGACCCCCGT 24 920 

GAGCAGCCCT GCTGGACCTT GGGAGTGGCT GCCTGATTGG CACCTCATGT TGGGTGGAGG AGGTACTCCT 24990 

GGGTGGGCCG CAGGGAGTGC AGGTGACCCT GTCACTGTTG AGGACACACC TGGCACCTAG GGTGGAGGCC 250 60 

TTCAGCCTTT CCTGCAGCAC ATGGGGCCGA CTGTGCACCC TGACTGCCCG GGCTCCTATT CCCAAGGAGG 25130 

GTCCCACTGG ATTCCAGTTT CCGTCAGAGA AGGAACCGCA ACGGCTCAGC CACCAGGCCC CGGTGCCTTG 25200 

CACCCCAGTC CTGAGCCAGG GGTCTCCTGT CCTGAGGCTC AGAGAGGGGA CACAGCCCGC CCTGCCCTTG 25270 

GGGTCTGGAG TGGTGGGGGT CAGAGAGAGA GTGGGGGACA CCGCCAGGCC AGGCCCTGAG GGCAGAGGTG 25340 

ATGTCTGAGT TTCTGCGTGG CCACTGTCAG TCTCCTCGCC TCCACTCACA CAGGTGGATG TGACGGGCGC 25410 

GTACGACACC ATCCCCCAGG ACAGGCTCAC GGAGGTCATC GCCAGCATCA TCAAACCCCA GAACACGTAC 25480 

TGCGTGCGTC GGTATGCCGT GGTCCAGAAG GCCGCCCATG GGCACGTCCG CAAGGCCTTC AAGAGCCACG 25550 

TAAGGTTCAC GTGTGATAGT CGTGTCCAGG ATGTGTGTCT CTGGGATATG AATGTGTCTA GAATGCAGTC 25 620 

GTGTCTGTGA TGCGTTTCTG TGGTGGAGGT ACTTCCATGA TTTACACATC TGTGATATGC GTGTGTGGCA 25590 
CGTGTGTGTC GTGGTGCATG TATC 
CCATGGTGTG TGTGCCTGTG GTGT 
CATGTCTGTG ATGTGCCTAT TTGT 
GGTGTGTGTG GCCCCTTGGC CTTA 
CGGGTGCTGG TTTGGG3AGC TCCA 
GGGCTGGGCC T7GGAGACTG TAAG 
CCCTGGCACC CCCAGG^CCC CAGT 
TCGCTCCCCG GGACACACTC CTCC 
TGGGCTTGGG TTCCCACCCA GTGG 
GGGTGCAGAG GTGAAG.^AGT ATCC 
CCTCTTTCTC T3ACTTCTTG AGCT 



TGTGGC GTGCATATTT GTGGTGT3TG 

GCATGT GTGTGTGTCT GTGACACGTG 

GGTGTG TGTGTGCATG TGTCCGT3AC 

CTCCTT CCTCCTCCAG GCATGGTCCG 

3CAGGT TTGAGAGGAG AGTAGGGATG 

CTGGC: TATGCCGGCT CCATGAG^TA 

CAGAGC GGCCGGGGGC CTTGGGGCTC 

TCATGi GCACGCTGGA GGGGTAAGCC 



TGTGTGTGGC ACGTGTGTGT 2 57 60 

CATGTTCATG CTGTGTGCTG 25830 

ATATGCGTG7 CTATGGCATG 2 5 900 

CACCATTGTC CTCACGCTCT 25970 

GGTGCCCCTG TCCTGTC^CA 26040 

CTGGTGGTAC CTTCCTGGAC 26110 

TAGGAAGGCT Gi.TTCAGGCC 26180 

GGCAGGGGTG A.AS.GGGGCCC 26250 

CTC.iUWiGTCG TGCCAGGCCG 2 6320 

CATGTGGAAA CCCACAAGG^ 2 6390 



Le A 32 805-Foreign Countries 



-30- 



Contig 2: 



TGTGGGATTG GTTTTCATGT GTGGGATAGG TGGGGATCTG TGGGATTGGT TTTTATGAGT GGGGTAACAC 70 

AGAGTTCAAG GCGAGCTTTC TTCCTGTAGT GGGTCTGCAG GTGCTCCAAC AGCTTTATTG AGGAGACCAT 14 0 

ATCTTCCTTT GAACTATGGT CGGGTTTATA GTAAGTCAGG GGTGTGGAGG CCTCCCCTGG GCTCCCTGTT 210 

CTGTTTCTTC CACTCTGGGG TCGTGTGGTG CCTGCTGTGG TGTGTGGCCG GTGGGCAGGG CTTCCAGGCC 280 

TCCTTGTGTT CATTGGCCTG GATGTGGCCC TGGCTACGCT CCGTCCTTGG AATTCCCCTG CGAGTTGGAG 350 

GCTTTCTTTC TTTCTTTTTT TCTTTCTTTT tTTTTTTTTT TGATAACAGA GTCTCGCTCT TTTTTGCCCA 42 0 

GGCTGGAGTG GTTTGGCGTG ATCTTGGCTC ACTGCAACCT GTGCTTCCTG AGTTCAAGCA ATTCTCTTGC 490 

CTCAGCCTCC CAAGTAGCTG GAATTATAGG CGCCCACCAC CATGCTGACT AATTTTTGTA ATTTTAGTAG 560 

AGACGAGGTT TCTCCATGTT GGCCAGGCTG GTCTCGAACT CCTGACCTCA GGTGATCCTC CCACCTCGGC 630 

CTCCCAAAGT GCTGGGATGA CAGGTGTGAA CCGCCGCGCC CGGCCGAGAC TCGCTTCCTG CAGCTTCCGT 700 

GAGATCTGCA GCGATAGCTG CCTGCAGCCT TGGTGCTGAC AACCTCCGTT TTCCTTCTCC AGGTCTCGCT 77 0 

AGGGGTCTTT CCATTTCATG ACTCTCTTCA CAGAAGAGTT TCACGTGTGC TGATTTCCCG GCTGTTTCCT 84 0 

GCGTAATTGG TGTCTGCTGT TTATCGATGG CCTCCTTCCA TTTCCTTTAG GCTTTGTTTA TTGTTGTTTT 910 

TCCGGCTCCT TGAAGGAAAA GTTTCGATTA TGGATGTTTG AACTTTCTTT TCTAAACAAG CATCTGAAGT 98 0 

TGCCGTTTTC CCTCTAAAGC AGGGATCCCG AGGCCCCTGG CTGTGGAGTG GCACCGGTCT GGGGCCTGTT 1050 

AGGAACCCGG CGCACAGCGG GAGGCTAGGT GGGGTGTGGG GAGCCAGCGT TCCCGCCTGA GCCCCGCCCC 1120 

TCTCAGATCA GCAGTGGCAT GCGGTGCTCA GAGGCGCACA CACCCTACTG AGAACTGTGC GTGAGAGGGG 1190 

TCTAGATTCT GTGCTCCTTA TGGGAATCTA ATGCCTGATG ATCTGAGGTG GAACCGTTTG CTCCCAAAAC 12 60 

CATCCCCTTC CCCACTGCTG TCCTGTGGAA AAATCGTCTT CCACGAAACC AGTCCCTGGT ACCACAATGG 1330 

TTGGGGACCC TGTGCTAAAG ACCTGCTTCA GCAGCCTCTC GTCAGTGTTG ATATATTGGC TTTTCTGTGT 1400 

TGAGTCCAGA ATAATTACGG ATTTCTGTGA_ TGCTTTCCGC CGACCTCAGA CCCATGGGCT ATTTGTGGGC 1470 

GTGTTGCCTG CTCCTGGGTT GGGAAGGGTG CAGGCCCCAT GTACCTTCCT GTTACTGCCT TCCAGGTTGG 1540 

TTCTCAGGGT TGAATCGTAC TCGATGTGGT TTTAGCCCAC GGCCCTGCCG CCAGCTCCTG GGGGCTGGGG 1610 

AACATGCTGA AGCACAGAGT CACCGTGCGC GTCTTTTGAT GCCTCACAAG CTCGAGGCCT CCTGTGTCCG 168 0 

TGTTAGTGTG TGTCACGTGC CTGCTCACAT CCTGTCTTGG GGACGCAGGG GCTTAGCAGG TCCCGTAGTA 1750 

AATGACAAGC GTCCTGGGGG AGTCTGCAGA ATAGGAGGTG GGGGTGCCGG TCTCTCTCCC GCGTCTTCAG 1820 

ACTCTTCTCC TGCCTGTGCT GTGGCTGCAC CTGCATCCCT GCAATCCCTC CAGCACTGGG CTGGAGAGGC 18 90 

CCGGGAGCTC GAGTGCCACT TGTGCCACGT GACTGTGGAT GGCAGTCGGT CACGGGGGTC TGATGTGTGG 1960 

TGACTGTGGA TGGCGGTTGG TCACAGGGGT CTGATGTGTG GTGACTGTGG ATGGCGGTCG TGGGGTCTGA 2030 

TGTGGTGACT GTGGATGGCG GTCGTGGGGT CTGATGTGTG GTGACTGTGG ATGGCGGTCG TGGGGTCTGA 2100 

TGTGGTGACT GTGGATGGCG GTCGTGGGGT CTGATGTGGT GACTGTGGAT GGCGGTCGTG GGGTCTGATG 217 0 

TGGTGACTGT GGATGGCAGT CGTGGGGTCT GATGTGTGGT GACTGTGGAT GGCGGTCGTG GGGTCTGATG 2240 

TGGTGACTGT GGATGGCAGT CGTGGGGTCT GATGTGTGGT GACTGTGGAT GGCGGTCGTG GGGTCTGATG 2310 

TGTGGTGACT GTGGATGGCG GTCGTGGGGT CTGATGTGTG GTGACTGTGG ATGGCGGTCG TGGGGTCTGA 2380 

TGTGTGGTGA CTGTGGATGG CGGTCGTGGG GTCTGATGTG GTGACTGTGG ATGGCGGTCG TGGGGTCTGA 2450 

TGTGTGGTGA CTGTGGATGG TGATCGGTCA CAGGGGTCTG ATGTGTGGTG ACTGTGGATG GCGGTCGTGG 2520 

GGTCTGATGT GTGGTGACTG TGGATGGTGA TCGGTCACAG GGGTCTGATG TGTGGTGACT GTGGATGGCG 2590 

GTCGrGGGGT CTGATGTGTG GTGACTGTGG ATGGCGGTTG GTCCCGGGGG TCTGATGTGT GGTGACTGTG 2 660 

GATGGCGATC GGTCACAGGG GTCTGATGTG TGGTGACTGT GGATGGCGGT CGTGGGGTCT GATGTGTGGT 2730 

GACTGTGGAT GGCGGTCGTG GGGTCTGATG TGTGGTGACT GTGGATGGCG GTCGTGGGGT CTGATGTGGT 2800 

GACTGTGGAT GGCGGTCGTG GGGTCTGATG TGGTGACTGT GGATGGCGGT CGTGGGGTCT GATGTGTGGT 2870 

GACTGTGGAT GGCGGTTGGT CCCGGGGGTC TGATGTGTGG TGACTGTGGA TGGCGGTCGT GGGGTCTGAT 294 0 

GTGGTGACTG TGGATGGCAG TCGTGGGGTC TGATGTGTGG TGACTGTGGA TGGCGGTCGT GGGGTCTGAT 3010 

GTGTGGTGAC TGTGGATGGC GGTCGTGGGG TCTGATGTGT GGTGACTGTG GATGGCGGTC GTGGGGTCTG 3080 

ATGTGTGGTG ACTGTGGATG GCGGTCGTGG GGTCTGATGT GGTGACTGTG GATGGCGGTC GTGGGGTCTG 3150 

ATGTGTGGTG ACTGTGGATG GTGATCGGTC ACAGGGGTCT GATGTGTGGT GACTGTGGAT GGCGGTCGTG 3220 

GGGTCTGATG TGTGGTGACT GTGGATGGCG GTCGTGGGGT CTGATGTGGT GACTGTGGAT GGCGGTCGTG 32 90 

GGGTCTGATG TGTGGTGACT GTGGATGGCG GTCGTAGGGT CTGATGTGTG GTGACTGTGG ATGGCAGTCG 33 60 

GTCACAGGGG TCTGATGTGT GGTGACTGTG GATGGCGGTC GTGGGGTCTG ATGTGTGGTG ACTGTGGATG 3430 

GCGGTCGTGG GGTCTGATGT GTGGTGACTG TGGATGGCGG TCGTGGGGTC TGATGTGTGG TGACTGTGGA 3500 

TGGCGGTCGT GGGGTCTGAT GTGGTGACTG TGGATGGTGA TCGGTCACAG GGGTCTGATG TGTGGTAGCT 357 0 

GCAGGTGGAG TCCCAGGTGT GTCTGTAGCT ACTTTGCGTC CTCGGCCCCC CGGCCCCCGT TTCCCAAACA 3640 

GAAGCTTCCC AGGCGCTCTC TGGGCTTCAT CCCGCCATCG GGCTTGGCCG CAGGTCCACA CGTCCTGATC 3710 

GGAAGAAACA AGTGCCCAGC TCTGGCCGGG GCAGGCCACA TTTGTGGCTC ATGCCCTCTC CTCTGCCGGC 37 8 0 

AGGTCTCTAC CTTGACAGAC CTCCAGCCGT ACATGCGACA GTTCGTGGCT CACCTGCAGG AGACCAGCCC 38 50 

GCTGAGGGAT GCCGTCGTCA TCGAGCAGGT CTGGGCACTG CCCTGCAGGG TTGGGCACGG ACTCCCAGCA 392 0 

GTGGGTCCTC CCCTGGGCAA TCACTGGGCT CATGACCGGA CAGACTGTTG GCCCTGGGGG GCAGTGGGGG 3990 

GAATGAGCTG TGATGGGGGC ATGATGAGCT GTGTGCCTTG GCGAAATCTG AGCTGGGCCA TGCCAGGCTG 4 0 60 

CGACAGCTGC TGCATTCAGG CACCTGCTCA CGTTTGACTG CGCGGCCTCT CTCCAGTTCC GCAGTGCCTT 4130 

TGTTCATGAT TTGCTAAATG TCTTCTCTGC CAGTTTTGAT CTTGAGGCCA AAGGAAAGGT GTCCCCCTCC 4200 

TTTAGGAGGG CAGGCCATGT TTGAGCCGTG TCCTGCCCAG CTGGCCCCTC AGTGCTGGGT CTGAGGCCAA 4270 

AGGAA.ACGTG TCCCCCTTCT TAGGAGGACG GGCCGTGTTT GAGCCACGCC CCGCTGAGCG GGCCTCTCAG 4340 

TGCTGGGTCT GTCCACGTGG CCCTGTGGCC CTTTGCAGAT GTGGTCTGTC CACGTGGCCC TGTGGCTCTT 4410 

TGCAGATGCC TGTTAGCACT TGCTCGGCTC TAGGGGACAG TCGTGTCCAC C3CATGAGGC TCAGAGACCT 4480 

CTGGGCGAAT TTCCTTGGCT CCCAGGGTGG GGGTGGAGGT GGCCTGGGCT GCTGGGACCC AGACCCTGTG 4550 

CCCGGCAGCT GGGCAGCAAC TCCTGGATCA CATATGCCAT CCGGGCCACG GTGGGCTGTG TGGGTGTGAG 4620 

CCCAGCTGGA CCCACAGGTG GCCCAGAGGA GACGTTCTGT GTCACACACT CTGCCTAAGC CCATGTGTGT 4690 

CTGC=iGAGA; TCGGCCCGGC CAGCCCACGA TGGCCCTGCA TTCCAGCCCA GCCCCGCACT TCATCACTVAA 4760 

CACTGACCCC AAAAGGGACG GAGGGTCTTG GCCACGTGGT CCTGCCTGTC TCAGCACCCA CCGGCTCACT 48 30 

CCCATGTGTC TCCCGTCTGC TTTCGCAGAG CTCCTCCCTG AATGAGGCCA GCAGTGGCCT CTTCGACGTC 4900 

TTCCTACGC7 TCATGTGCCA CCACGCCGTG CGCATCAGGG GCAAGTGAGT CAGGTGGCCA GGTGCCATTG 4970 

CCCTGCGGG? GGCTGGGCGG GCTGGCAGGG CTTCTGCTCA CCTCTCTCC7 GCCCCTTCCC CACT3NCCTT 5040 

CTGCCCGGGG CCACCAGAGT CTCCTTTTCT GGCCCCCGCC CCCTCCGGCT CCTGGGCTGC AGGCTCCCGA 5110 

GGCCCCGGAA ACATGGCTCG GCTTGCGGCA GCCGGAGCGG AGCAGGTGCC ACACGAGGCC TGGAAATGGC 5180 

-.AG:GG3GT3 TGGAGTTGCT CCTGCGTGGA GGACGAGGGG CGGGGGGTGT GTCTGGGTCA GGTGTGCGCC 5250 
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GAGCGTTTGA GCCTGCAGCT TGTCAGCTCC AAGTTACTAC TGACGCTGGA CACCCGGCTC TCACACGCTT 5320 

GTATCTCTCT CTCCCGATAC AAAAGGATTT TATCCGATTC TCATTCCTGT CCCTGTCGTG TGRCCCCCGC 5390 

GAGGGCGCGG GCTCTTCTCT CTGTGACTAG ATTTCCCATC TGGAAAGTGC GGGGTTGACC GTGTAGTTTG 54 60 

CTCCTCTCGG GGGGCCTGTG GTGGCCATGG GGCAGGCGGC CTGGGAGAGC TGCCGTCACA CAGCCACTGG 5530 

GTGAGCCACA CTCACGGTGG TAGAGCCACA GTGCCTGGTG CCACATCACG TCCTCTGGAT TTTAAGTAAA 5600 

ACCACACACC TCCCGGCAGG CATCTGCCTG CGACCCTGTG TGTGCCTGGG GAGAGTGGTA GCACGGAGGA 5670 

AATTCGTGCA CACTCAAGGT CATCAGCAAG GTCATCCGCA GTCAGGTGGA ACGTGGAGGC CTCTCTCTGG 574 0 

GATCGTCTCC AGCGGATAAA GGACTGTGCA CAGCTTCGGA AGCTTTTATT TAAAAATATA ACTATTAATT 5810 

ATTGCATTAT AAGTAATCAC TAATGGTATC AGCAATTATA ATATTTATTA AAGTATAATT AGAAATATTA 5880 

AGTAGTACAC ACGTTCTGGA AAAACACAAA TTGCACATGG CAGCAGAGTG AATTTTGGCC GAGGGACACG 5950 

TGTGCACATG TGTGTAAGCG GCCCCCAGGC CCACAGAATT CGCTGACAAA GTCACCTCCC CAGAGAAGCC 602 0 

ACCACGGGCC TCCTTCGTGG TCGTGAATTT TATTAAGATG GATCAAGTCA CGTACCGTCC ACGTGTGGCA 6090 

GGGCTTTGGG GAATGTGAGG TGATGACTGC GTCCTCATGC CCTGACAGAC AGGAGGTGAC TGTGTCTGTC 6160 

CTGTCCCTAG GACACGGACA GGCCCGAAGC TCTAGTCCCC ATCGTGGTCC AGTTTGGCCT CTGAATAAAA 62 30 

ACGTCTTCAA AACCTGTTGC CCCAAAAACT AAGAACAGAG AGAGTTTCCC ATCCCATGTG CTCACAGGGG 6300 

CGTATCTGCT TGCGTTGACT CGCTGGGCTG GCCGGACTCC TAGAGTTGGT GCGTGTGCTT CTGTGCAAAA 6370 

AGTGCAGTCC TCTTGCCCAT CACTGTGATA TCTGCACCAG CAAGGAAAGC CTCTTTTCTT TTCTTTCTTT 64 4 0 

TTTTTTTTTT GAGACGGAAC GTCACTGTTG TGTGCCTGGG CTTGAGTGCA GTGGCGCGAT CTCAACTCAC 6510 

TGCAACCTCC GCCTCCCGGG TTCCAGCATT TCTCCTGCCT CAGCCTCCCG AGCAGCTGAG ATTACAGGCA 6580 

CCCACCCCCT GCGCCTGGCT AATTTTTGTA TTTTTAGTAG AGAGGGGTTT TTGCCATGTT GGCCAGGCTG 6650 

GTCTCGAACT CCTGACCTCA GGTGATCCAC CCACCTCGGC CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 6720 

CCATCACGCC CAGCCGGAAA GCCTCTTTTT AAGGTGACCA CCTATAGCGC TTCCCGAAAA TAACAGGTCT 67 90 

TGTTTTTGCA GTAGGCTGCA AGCGTCTCTT AGCAACAGGA GTGGCGTCCT GTGGGCTCTG GGGATGGCTG 6860 

AGGGTCGCGT GGCAGCCATG CCTTCTGTGT GCACCTTTAG GTTCCACGGG GCTATTCTGC TCTCACTGTT 6930 

TGTCTGAAAA CGCACCCTTG GCATCCTTGT TTGGAGAGTT TCTGCTTCTC GTTGGTCATG CTGAAACTAG 7000 

GGGCAAGGTT GTATCCGTTG GCGCGCAGCG GCTACATGTA GGGTCATGAG TCTTTCACCG TGGACAAATT 7070 

CCTTGAAAAA AAAAAAAGGA GTCCGGTTAA GCATTCATTC CGGGTCAAGT GTCTGGTTCT GTGAATAAAC 714 0 

TCTAAGATTT AAGAAACCTT AATGAAAGAA AACCTTGATG ATTCAGAGCA AGGATGTGGT CACACCTGTG 7210 

GCTGGATCTG TTTCAGCCGC CCCAGTGCAT GGTGAGAGTG GGGAGCAGGG ATTGTTTGTT CAGAGGTCTC 72 8 0 

ATCTGGTATG TTTCTGAGGT GTTTGCCGGC TGAATGGTAG ACGTGTCGTT TGTGTGTATG AGGTTCTGTG 7350 

TCTGTGTGTG GCTCGGTTTG AGTGTACGCA TGTCCAGCAC ATGCCCTGCC CGTCTCTCAC CTGTGTCTTC 7420 

CCGCCCCAGG TCCTACGTCC AGTGCCAGGG GATCCCGCAG GGCTCCATCC TCTCCACGCT GCTCTGCAGC 7490 

CTGTGCTACG GCGACATGGA GAACAAGCTG TTTGCGGGGA TTCGGCGGGA CGGGTGAGGC CTCCTCTTCC 75 60 

CCAGGGGGGC TTGGGTGGGG GTTGATTTGC TTTTGATGCA TTCAGTGTTA ATATTCCTGG TGCTCTGGAG 7630 

ACCATGACTG CTCTGTCTTG AGGAACCAGA CAAGGTTGCA GCCCCTTCTT GGTATGAAGC CGCACGGGAG 7700 

GGGrXGCACA GCCTGAGGAC TGCGGGCTCC ACGCAGGCTC TGTCCAGCGG CCATGTCCAG AGGCCTCAGG 7770 

GCTCAGCAGG CGGGAGGGCC GCTGCCCTGC ATGATGAGCA TGTGAATTCA ACACCGAGGA AGCACACCAG 7840 

CTTCTGTCAC GTCACCCAGG TTCCGTTAGG GTCCTTGGGG AGATGGGGCT GGTGCAGCCT GAGGCCCCAC 7910 

ATCTCCCAGC AGGCCCTCGA CAGGTGGCCT GGACTGGGCG CCTCTTCAGC CCATTGCCCA TCCCACTTGC 7980 

ATGGGGTCTA CACCCAAGGA CGCACACACC TAAATATCGT GCCAACCTAA TGTGGTTCAA CTCAGCTGGC 8050 

TTTTATTGAC AGCAGTTACT TTTTTTTTTT TAATACTTTA AGTTCTAGGG TACATGTGCA CGACGTGCAG 8120 

GTTAGTTACA TATGTATACA TGTGCCATGT TGGTGTGCTG CACCCATTAA CTCATCATTT ACATTAGGTA 8190 

TATCTCCTAA TGCTATCCCT CCCCACTCCC CCCATCCCAT GACAGGCCCT GGTGTGTGAT GTTCCCCACC 82 50 

CTGTGTCCAA GTGTTCTCAT TGTTCAGTTC CCACCTGTGA GTGAGAACAT GTGGTGTTTG GTTTTCTTTC 8330 

CTTGCAATAG TTTGCTCAGA GTGATGGTTT CCAGCTTCGT CCATGTCCCT ACAAAGGACA TGAACTCATC 8400 

CTTTTTTATG ACTGCATAGT ATTCCGTGGT GTATATGTGC CACATTTTCT TAATCCAGTC TATCATCGAT 8470 

GGACATTTGG GTTGGTTGCA AGTCTTTGCT ACTGTGAATA GTGCCGCAAT AAACATACGT GTGCATGTGT 85 4 0 

CTTTATAGCA GCATGATTTA TAATCCTTTG GGTATATACC CAGTAATGGG ATGGCTGGGT CAAATGGTAT 8 610 

TTCTAGTTCT AGATCCTTGA GGAATCACCA CACTGTCTTC CACAATGGTT GAACTAGTTT ACACTCCCAC 8680 

CAACAGTGTA AAAGTGTTCT GGTGCTGGAG AGGATGTGGA CAGCAGTTAT TTTTTTATGA AAATAGTATC 8750 

ACTGAACAAG CAGACAGTTA GTGAAGGATG CGTCAGGAAG CCTGCAGGCC ACACAGCCAT TTCTCTCGAA 8820 

GACTCCGGGT TTTTCCTGTG CATCTTTTGA AACTCTAGCT CCAATTATAG CATGTACAGT GGATCAAGGT 88 90 

TCTTCTTCAT TAAGGTTCAA GTTCTAGATT GAAATAAGTT TATGTAACAG AAACAAAAAT TTCTTGTACA 8 9 60 

CACAACTTGC TCTGGGATTT GGAGGAA-AGT GTCCTCGAGC TGGCGGCACA CTGGTCAGCC CTCTGGGACA 90 30 

GGATACCTCT GGCCCATGGT CATGGGGCGC TGGGCTTGGG CCTGAGGGTC ACACAGTGCA CCATGCCCAG 9100 

CTTCCTGTGG ATAGGATCTG GGTCTCGGAT CATGCTGAGG ACCACAGCTG CCATGCTGGT AAAGGGCACC 9170 

ACGTGGCTCA GAGGGGGCGA GGTTCCCAGC CCCAGCTTTC TTACCGTCTT CAGTTATTTT TCCCTAAGAG 92 40 

TCTGAGAAGT GGGGCCGCGC CTGATGGCCT TCGT7CGTCT TCAGCTGGCA CAGAATTGCA CAAGCTGATG 9310 

GTA.AACACTG AGTACTTATA ATGA.-.TGAGG AATTGCTGTA GCAGTTAACT GTAGAGAGCT CGTCTGTTGG 93 30 

AAAGAAATTT AAGTTTTTCA TTTA.-.CC3CT TTGGAGAATG TTACTTTATT TATGGCTGTG TAAATTGTTT 94 50 
GACATTCAGT CCCTCGTAGA CAGATAC7AC G' 
ATTTTAGGCT GCTCCTGCGT TTGGTGGATG A' 

CTTCCTCAGG TGAGGCCCGT GCCGTGT3TC TGTGG3GACC TCCACAGCCT GTGGGCTTTG CAGTTGAGCC 9660 

CCCCGTGTCC TGCCCCTGGC ACCGCAGCGT TGTCTCTGCC AAGTCCTCTC TCTCTGCCGG TGCTGGATCC 97 30 

GCAAGAGCAG AGGCGCTTGG CCGT3CAGCC AGGCCTGGGG GCGCAGGGGC ACCTTCGGGA GGGAGTGGGT 98 00 

ACC3TGCAGG CCCTGGTCCT GCAG---3--CGC ACCC-.3GTTA CACACGTGGT GAGTGCAGGC GGTGACCTGG 9870 

CTCCTGCTGC TCTTTGGAAA GTCA_-.G^3TG GCGGCTCCTG GGGC3CCAGT GAGACCCCCA GGAGCTGTGC 9940 

ACAGGGCCTG CAGGGCCGAG GCGG3AGCCT CCTC33CAGG GTGCACCTGA GCCTGCGGAG AGCAGGAGCT 10010 

GCT3AGTGAG CTGGCCCACA GCGT7C33TG CGGT'JACGTT CCTGCGTGGG GTTGTTTGGG ATCGGTGGGA 10080 

GAA.TTTGGAT TTGCTGAGTG CTGCTGTCTT GAJ\C7ACGGA GATG3CTAGG AGTGGGTTTC AGAGTTGATT 10150 

TT73TGAATC AAACTA.AAA7 CAGG3-.3AGG GGAC:T33CC TCAGCACAGG GGATTGTCCA ATGTGGTCCC 10220 

CCTCAA.GGGC GCCCCACAGA GCCG3TG3GC TTGTT?3AAA GTGC3ATTTG ACGAGGGACG AGAAACCTTG 10290 

AAA3C7GTAA AGGGAACCCT CAGA^AAATGT GGCCCCC-.3G GGTG3TTTCA GGTGCTTTGC TGGGCTGTGT 10360 

TTGTGAA-AAC CCATTTGGAC CCGC3C73CA AGTC3ACCC7 CCAG3TCCAC CCTCCAGGGC CGCCCTGGGC 10430 

TGG3GGTATG CCTGGCGTTC CTTG73333C A3CC333^GC ACAG^AGGCT GTGCACAT7T AAA.TCCACTA 10500 

AG--7TCACTC GGGGGGAGCC CAGG7CC3AA GCAA;73AGG GCTCAGGAGT CCTGAGGCTG CTGAGGGGAC 10570 

AGA3CAGACG GGGAACGCTG CTTC7G73TG G3A-A377CCT GAGG3TGC7G GCCAGGGAGG TGGCTCAGAG 10540 
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TGGCCACAGG GAGCTGGGAA TGCACCAGGG GAGCTGCGCA 
GAAGGGCAGG GGGACGCCCG GGGCCACAGC AGAGGCCGCA 
GAGGCTACCG GGCACAGGGG GGCTCCCTGA GCTGGGTGAG 
TTGACGTGAA GCTGACGACT GGTGTTGCCC AGCTCACAGC 
CAGAACCCTC CCCTTTGTCT AAAGCACAGC AGATGCCTTC 
TTGAGAAACG TCTTAAAAGA AGGTGGGATG GTGGCAATTT 
CAGATGAGTC TATAACGGGA TTGTGGTGTT GCCATGGGGA 
GGGGCTGCAC CTCCCATCTG AGTCCTGGCT GTCCCGGGTC 
GTCCTGCCCG GGAGACAGGG AAAGCACCCC GAAGTCTGGA 
CTGCCAGGCC CAGCACCCTG CTCCAAATCA CCACTTCTCT 
CAGGTTACCT CCTGGGTGAC GGCCCCGCAT CCTGGGGCTG 
CGAGGTGTCC CTGAGTATGG CTGCGTGGTG AACTTGCGGA 
AGGCCCTGGG TGGCACGGCT TTTGTTCAGA TGCCGGCCCA 
GGATACCCGG ACCCTGGAGG TGCAGAGCGA CTACTCCAGG 
GCCCGGCTGG GGCAGGTGCT GCTGCAGGGC CGTTGCGTCC 
CCAATCCCAA AGGGTCAGAG GCCACAGGGT GCCCCTCGTC 
CTGTGGGAGT GAGGGTGCTC ACAACGGGAG CAGTTTTCTG 
AGACCTGGGT GCACTGAGGT GTCTTCAGAA AGCAGTCTGG 
GGCGTGAGTC TCTCAAACCC GAACACAGGG GCCCTGCTGG 
GGCCCTGCTG GGCGTGAGTC TCTCCGAACC CAGAGACTTC 
GTGAGCCCCA CACTCCAAGG CTCATCCACA GTCTACAGGA 
TCAGGGGACA GGGCCATGGT GTGGGGGGGG TCTCTACAAA 
GAGCTCAAGG CCCCGTCTCA GGCTCAGACA CAAATGAATT 
TGTTTCTTTT ATGAATAAAA AGTATCAACA" TTCCAGGCAG 
ACTTTGGGAG GCCGAGGTGG GTGGATCACT TGAGGCCAGG 
AATTCCATTT CTACTTAAAA AATACAAAAA TTAGCCTGGC 
GCGGGAGGCT GAGGCAGGAG AATCATTTGA ACCCAGGAGG 
CTGCACTCCA GCCTGGGCAA CAGAGTGAGA CTTCATCTTA 
AACCATAGTG GACAGGTGTT TTTTTATTCT GTCCTTCGAT 
AACTGGGGGT GCCTTCCTCT GAAAGGCACA CCTTCATGGG 
CCAGAGGTTT AAACTGGGGT CCTGTCGTTC TGAGTTAACA 
ATGCTCCCTG GGGTTTGCTT CATGGGGGAG CAGCAGGTGT 
GCAGACGCCC TCATGATGGG GGAGTGGCAG GTGCAGACAC 
TGCAGCTCCC TCCCCACAAG GATGCCGGTC TCCTGTGCTC 
TACCTGGTCC TGGCCTCCAC TGGCTTTGTC TGCATGATTT 
GCCTCTCCCA GGCACCTCTG CAGTGCTGGC CATACCAGTC 
CCCCATGAAA TGTATTTTTT AGGACAGGCA CCCCTGGTTC 
TTGAAGGACA AAGGACAGAC AAACAAATCA GGAAAATGGG 
GGCTAGTGCA GGATGGGTGG GCATCAGGTC ATCAGATGTG 
AAGGCCACTT GGTCAGAGTG TGTGCTTGCA GAGGTGGCTC 
CCATACTCAG GGTGAACTCA CATCCTCTGT GTCTGAAGTA 
AAGAAAACAG GCAAAATGAT TAAGAAAAGT GAAAAAGGAA 
ATTTTAGTCT CCCAAACCAC AGCTCAGATG GTAGAATGTG 
AAACGGAAGC CCTATCTCTC AGAAACGTGT GTTAATGTGG 
TGTGTGTAAT TTTTTTTTCT GAGAAAACTG ACTGGAAGCA 
AGCAGATTCT AGGTAGAAGA GGAGACACAT GCAAACAACA 
AGGGAAGGGA GGTGAACGTT CCCTGGTTTG GTGTTGGGGA 
GAGGCAACGG GCATTGCTTT CACTGCAGAG AAACTCAGCT 
CTGGAGCGTT TGTGCACGTG ATTTATTTAA GGCGCCCTGT 
TTCTCCTAAC CACCTGAGAG GTAGAGGAGG AAAGGCTCCA 
GCAAAGGGCA TGCATGATTG CAGCCTGGCC TCCTGCTCCG 
AAGTCAGACC CATAGGCTCA GGGTGAGCCG GAGCCCAAGG 
TGGACGTCTG ATGCACACTT GGGAAGGTCC TACCAGCAGC 
AGACCCATCC CTCAAAGAAA CGCACGTGAA ACTGATGGCG 
TTTTCTGGGC TTGCCAAGAG CCAGCATCAG GTTGAGGCAA 
TTTGCATGGA AGTCCTCACA ATGTCCTGTG TCTTCCCAGT 
CACGGGTCTT ATTTACCATT TCCAGTGTTC CAGGCAGGGG 
AAATACAGGG CTAAGGAGAT ATTATGCATC ACAAAACTTG 
TTGAAGAATG TTTAATGGCA CAAAACGTTT ATTTCAATGT 
ACACCCCAGG AGCCTGCCGT GAATGTCATG TGTGTTCATC 
GGTGGTGAGG CCCTGGAGGA CATCGGTGGG ATGCCTCCAT 
CGTGCACTCA CTGGAGCCCT GTTTAGCTGG TGCCACCTGG 
AGATTCCCCA CGCCCAACTC AGTGTTCTCC CACAAAAAAC 
GCCCGGGAGC CAGGGCTCCA CAGTTTATTA TGTGTTTTTG 
GATGATGAGT GCACAAACAC GGCCGTGCGA GGTTTGGATA 
AGTTTGGTCA TGCAGAGTCT GGATGGCATG TAGCATTTGG 
GGCTGCAGCG CATGCCCCAG GCAGGACAAG GAAGCGGGAG 
GCAGGAGGGG GCTGGGTGTG GGGCAGGCAC CTGTGTCTGA 
ACCTCCATCA GAGCCAGTCT CACCTTCAAC CGCGGCTTCA 
TTGGGGTCTT GCGGCTGAAG TGTCACAGCC TGTTTCTGGA 
AGAGTTCAGA GTTCAGGAG3 TGTGTGCGCA AGTATGTGTG 
ATGGTGACTG GCTGCACGTA AGAGTGCACA TGTACGCATA 
TGTACATGAA GGCATGGCAG TGTGTGCACA GGTGTGCAAG 
CCTGACATGC ATGTGTGTTC GTGCACAGTC GTGTGGGCAT 
GTGTGAGTAG CATGTGTGCA CATAACATGT ATTGAGGGGT 
CCAGTGCCAC TCCTTACAGG ATGAGACGGG GTCCCAGGCC 
CTGAGGGCAT TGTCCCATCT GGGCATCCGC GTCCACTCCC 
TCTCCTGTGG GCATTTACAT CCACTCCACT CCCTCTCTCC 



GCTGGCCGAG GTCCCAGGGC CAGGCCACAG 10780 
GGAAGGGAAG GGGATGCCCA GGCCAGAGCA 10850 
CGAGGCTCAT GACTCGGCGA GGGAACCTCC 10920 
CCAGCCAGGT CCCGCGCCTG AGCAGGAACT 10 990 
AGGGCATCTA GGAGAAAACA GGCAAAGTCG 11060 
CTTGTCCAGA TTTTAGTCTG CCCCGGACCA 11130 
CACATGAGAT GGACCATCAC AGAGGCCACT 11200 
CAGGCCAGGT TCTTGCATGC TCACCTACCT 11270 
GCAGGGCTGG GTCCAGGCTC CTCAGAGCTC 11340 
GGGGTTTTCC AAAGCATTTA ACAAGGGTGT 11410 
ACATTGCCCC TCTGCCTTAG GACCCTGGTC 11480 
AGACAGTGGT GAACTTCCCT GTAGAAGACG 11550 
CGGCCTATTC CCCTGGTGCG GCCTGCTGCT 11620 
TGAGCGCACC TGGCCGGAAG TGGAGCCTGT 11690 
ACCTCTGCTT CCGTGTGGGG CAGGCGACTG 11760 
CCATCTGGGG CTGAGCAGAA ATGCATCTTT 118 30 
TGCTATTTTG GTAAAAGGAA ATGGTGCACC 11900 
ATCCGAACCC AAGACGCCCG GGCCCTGCTG 11970 
GCATGAGTCC CTCTGAACCC GAGACCCTGG 12040 
AGGGCCCTTT TGGGCGTGAG TCTCTCCGCT 12110 
TGCCATGAGT TCATGATCAC GTGTGACCCA 12180 
ATTCTGGGGT CTTGTTTCCC CAGAGCCCGA 12250 
GAAGATGGAC ACAGATGCAG AAATCTGTGC 12320 
GGCAAGGTGG CTCACACCTA TAATCCCAGC 12390 
AGTTTGAGGC CAACCTAACC AACATAGTGA 124 60 
CTGGTGGCAC ACGCCTGTAG TCCCCGCTAT 12 530 
CAGAGGTTGC AGTGAGCCGA GATCACACCA 12 600 
AAAAAAAAAA AAAAAGTATC AGCATTCCAA 12670 
AATATTTACT GGTGCTGTGC TAGAGGCCGG 12740 
AAGAGAAJ\.TA AGTGGTGAAT GGTTGTTAAA 12810 
GTCCAGATCT GGACTTTGCC TCTTTCCAGA 12880 
GGACACCCTC GTGATGGGGG AGCAGCAGGT 12 950 
CCTTGTGCAT GGTGCCCAGC ATGTCCCTGT 13020 
CCCACAGTCC CTGCTTCCCT CTCACAGCCT 13090 
CCACATTTCC TGGGCTCCCA GCACCTCTTC 13160 
AGCTGTGAAC TGTCCACTGC TTATTTTGCT 13230 
CAGCCTCTGG CACAGCATCA GTGAATGTTA 13300 
TTCTCTCTAA ACACATTGCA AAGCCACAGA 13370 
GGTCCAATGC CAGAATATTC TGTGCTCCCA 13440 
TAAAAGCTCA GCAGTGGAGG CAGTGGTTCG 13510 
TACAGCAGAG GCTTGAAGGG CATCTGGGAG 13 580 
AAGTGGTAAG ATGGGAATTT TCTTGTCCAG 13650 
GTCAGAACTG ATGGACAGAA CAATAGAACA 13720 
TATGTGGCAC AGCTGATGGA AAAGAGAGTG 13790 
AATAAGTTGT GTCTTTACAG CATATACCAG 13860 
CCAGCAACAG AAATAAAACA AAAGACTCAA 13 930 
AGGACACACA GGGAGGCGGA TGAAACCAGT 14000 
TGCCTGAGCC ACAGTGAAAA TGGCCATTCC 14 070 
GAGGTCCTGC ACATTCATCC TCTCACTTTG 1414 0 
GGGGAGCAGC CGCCCTTGGT CACCCAGCTG 14210 
GGGCCCTTGC TCTGCCCGAG GACCCCACAC 14280 
TCGTGTTGGG GATGGCTGTG AAAGAAGAAA 14 350 
GTCAAAGAAA TGCATGTGAA ACTGACAGCG 14 420 
AGACCTGTCC CCATCCCTCA TGCTGGCTCC 14 490 
GCTGGAAAGA CTTTTCTGGA AAGCAGCTTG 14 560 
AATTCCACTT CTGAAGTGAC CAGACATTAT 14 630 
GACTTGCCAC AGCAAGTCAC GAACCTGCCC 14700 
CTCTGCCATT AAACATTTTT CAAAGAATTT 14770 
AGCAGTGTTC AAAGCTGGAT GTAAAAGA.;\C 14 840 
TTTGGACATG GACATACATG GGCAGTGAGT 14 910 
CCTGCCCCTC TGGAGACACC ATGTGTGCCA 14 930 
CTCTTCCATC CCTGAGATTC AAAGAGAGTG 15050 
CTGAGTCACA CCTGTGTTCA CTCGAGGGAC 15120 
GCTGAGTTAT GTGCAGATCT CATCAGGGCA 15190 
CACTCAACAT CACTAGCCAG GTCCTGGTGG 15260 
AGTCCATGGA GTGAGCACCC AGCCCCCTCG 15 330 
GAAGGCAGGA GGCTCTTTGG AGCAAGCTTT 15 400 
CATTCCCCCC TGTGTCTCAG CTATGCCCGG 15 470 
AGGCTGGGAG GAACATGCGT CGCAAACTCT 1554 0 
TTTGCAGGTG AGCAGGCTGA TGGTCAGChC 15 610 
TGTGTGTGTG CGCGCGTGCC TGCAAGGCTG 15 680 
TACACGTGAG GACATACATG TGTGCATGTG 15750 
GGCACAAGTG TGTGCACATG CGAATGCACA 1532 0 
TCACGTGAGG TGCATGCGTG TGGGTGTGCA 15 890 
CCTCGTGTTC ACCCCGCTAG GTCCTCAGCA 15960 
TTGGTGGGCT GAGGCTCTGA AGCTGCAGCC 16330 
TCTCCTGTGG GCTTCTGTGT CCACTCCCCC lu'.OO 
TGTGGGCATC CGCGTCCACT CCCCCTCT^T 15.70 



Le A 32 805-Foreign Countries 



-33- 



GTGGGCATCT GCGTCCACCT CCCCTCTCTG TGGGCATTTG CGTCCACTCC CTCTCCTGGT TCCTTCCTGT 152 40 
CTTGGCCGAG CCTCGGGGGC AGGCAGATGA CACAGAGTCT TGACTCGCCC AGGGTGGTTC GCAGCTGCCG 16310 
GGTGAGGGCC AGGCCGGATT TCACTGGGAA GAGGGATAGT TTCTTGTCAA AATGTTCCTC TTTCTTGTTC 16380 
CATCTGAATG GATGATAAAG CAAAAAGTAA AAACTTAAAA TCCCAGAGAG GTTTCTACCG TTTCTCACTC 16450 
5 TTTCTTGGCG ACTCTAGGTG AACAGCCTCC AGACGGTGTG CACCAACATC TACAAGATCC TCCTGCTGCA 16520 

GGCGTACAGG TGAGCCGCCA CCMGGGGTG CAGGCCCAGC CTCCAGGGAC CCTCCGCGCT CTGCTCACCT 16590 
CTGACCCGGG GCTTCACCTT GGAACTCCTG GGTTTTAGGG GCAAGGAATG TCTTACGTTT TCAGTGGTGC 16660 
TGCTGCCTGT GCACAGTTCT GTTCGCGTGG CTCTGTGCAA AGCACCTGTT CTCCATCTCT GGGTAGTGGT 16730 
AGGAGCCGGT GTGGCCCCAG GTGTCCCCAC TGTGCCTGTG CACTGGCCGT GGGACGTCAT GGAGGCCATC 16800 
10 CCAGGGCAGC AGGGGCATGG GGTAAAGAGA TGTTTATGGG GAGTCTTAGC AGAGGAGGCT GGGAAGGTGT 16870 

CTGAACAGTA GATGGGAGAT CAGATGCCCG GAGGATTTGG GGTCTCAGCA AAGAGGGCCG AGGTGGGTGC 16940 
AGGTGAGGGT CGCTGGCCCC ACCCCCGGGA AGGTGCAGCA GAGCTGTGGC TCCCCACACA GCCCGGCCAG 17010 
CACCTGTGCT CTGGGCATGG CTGTGCTCCT GGAACGTTCC CTGTCCTGGC TGGTCAGGGG GTGCCCCTGC 17 080 
CAAGAATCGA CAACTTTATC ACAGAGGGAA GGGCCAATCT GTGGAGGCCA CAGGGCCAGC TTCTGCCTGG 17150 
15 AGTCAGGGCA GGTGGTGGCA CAAGCCTCGG GGCTGTACCA AAGGGCAGTC GGGCACCACA GGCCCGGGCC 17220 

TCCACCTCAA CAGGCCTCCC GAGCCACTGG GAGCTGAATG CCAGGAGGCC GAAGCCCTCG CCCCATGAGG 17290 
GCTGAGAAGG AGTGTGAGCA TTTGTGTTAC CCAGGGCCGA GGCTGCGCGA ATTACCGTGC ACACTTGATG 17 360 
TGAAATGAGG TCGTCGTCTA TCGTGGAAAC CCAGCAAGGG CTCACGGGAG AGTTTTCCAT TACAAGGTCG 17 4 30 
TACCATGAAA ATGGTTTTTA ACCCGAGTGC TTGCGCCTTC ATGCTCTGGC AGGGAGGGCA GAGCCACAGC 17 500 
20 TGCATGTTAC CGCCTTTGCA CCAGCTCCAG AGGCTTGGGA CCAGGCTGTC TCAGTTCCAG GGTGCGTCCG 17 570 

GCTCAGACCG CCCTCCTCTC TGCCTTCTCT CTCTGCCTCA AATCTTCCCT CGTTTGCATC TCCCTGACGC 17 640 
GTGCCTGGGC CCTCGTGCAA GCTGCTTGAC TCCTTTCCGG AAACCCTTGG GGTGTGCTGG ATACAGGTGC 17710 
CACTGAGGAC TGGAGGTGTC TGACACTGTG GTTGACCCCA GGGTCCAGCT GGCGTGCTTG GGGCCTCCTT 17780 
GGGCCATGAT GAGGTCAGAG GAGTTTTCCC AGGTGAAAAC TCCTGGGAAA CTCCCAGGGC CATGTGACCT 17850 
25 GCCACCTGCT CCTCCCATAT TCAGCTCAGT CTTGTCCTCA TTTCCCCACC AGGGTCTCTA GCTCCGAGGA 17 920 

GCTCCCGTAG AGGGCCTGGG CTCAGGGCAG GGCGGCTGAG TTTCCCCACC CATGTGGGGA CCCTTGGGTA 17 990 
GTCGCTTGAT TGGGTAGCCC TGAGGAGGCC GAGATGCGAT GGGCCACGGG CCGTTTCCAA ACACAGAGTC 18 0 60 
AGGCACGTGG AAGGCCCAGG AATCCCCTTC CCTCGAGGCA GGAGTGGGAG AACGGAGAGC TGGGCCCCGA 18130 
TTTCACGGCA GCCAGGCTGC AGTGGGCGAG GCTGTGGTGG TCCACGTGGC GCTGGGGGCG GGGTCTGATT 18200 
30 CAAATCCGCT GGGGCTCGGC CTTCCTGGCC CGTGCTGGCC GCGCCTCCAC ACGGGCTTGG GGTGGACGCC 18270 

CCGACCTCTA GCAGGTGGCT ATTTCTCCCT TTGGAAGAGA GCCCCTCACC CATGCTAGGT GTTTCCCTCC 18340 
TGGGTCAGGA GCGTGGCCGT GTGGCAACCC CGGGACCTTA GGCTTATTTA TTTGTTTAAA AACATTCTGG 18 410 
GCCTGGCTTC CGTTGTTGCT AAATGGGGAA AAGACATCCC ACCTCAGCAG AGTTACTGAG AGGCTGAAAC 18480 
CGGGGTGCTG GCTTGACTGG TGTGATCTCA GGTCATTCCA GAAGTGGCTC AGGAAGTCAG TGAGACCAGG 18 550 
35 TACATGGGGG GCTCAGGCAG TGGGTGAGAT GAGGTACACG GGGGGCTCAG GCAGTGGGTG AGGCCAGGTA 18 620 

CATGGGGGGC TCAGGCACTG GGTGAGATGA GGTACACGGG GGGCTCAGGC AGAGGGTCAG ACCAGGTACA 18 690 
C&GGGGCTCT GATCACACGC ACATATGAGC ACATGTGCAC ATGTGCTGTT TCATGGTAGC CAGGTCTGTG 18 7 60 
CACACCTGCC CCAAAGTCCC AGGAAGCTGA GAGGCCAAAG ATGGAGGCTG ACAGGGCTGG CGCGGTGGCT 18830 
CACACCTGTA GTCCCAGCAC TTTGGGAGGC CGAGGCGAGA GGATCCCTTG AGCCCAGGAG TTTAAGACCA 18900 
40 GCCTGAGCAA CATAGTAGAA CCCCATCTCT ATGAAAAATA AAAACAAAAA TTAGCTGAAC ATGGTGGTGT 18 970 

GCGCCTGTAG TTCCAATACT TGGGAGGCTG AAGTGGGAGG ATCACTTGAG CCCAGGAGGT GGAAGCTGCA 19040 
G7GAGCTGAG ATTGCACCAC TGTACTGCAG CCTGGGTGAC AGAGTGAGAG CCCATCTCAA CAACAACAAA 19110 
G;ijiGACTGAC AAATGCAGTT TCTTGGAAAG AAACATTTAG TAGGAACTTA ACCTACACAC AGAAGCCAAG 19180 
TCGGTGTCTC GGTGTCAGTG AGATGAGATG ATGGGTCCTC ACACCATCAC CCCAGACCCA GGGTTTATGC 19250 
45 ACCACAGGGG CGGGTGGCTC AGAAGGGATG CGCAGGACGT TGATATACGA TGACATCAAG GTTGTCTGAC 19320 

GP-^GGGCAGG ATTCATGATA AGTACCTGCT GGTACACAAG GAACAATGGA TAAACTGGAA ACCTTAGAGG 19390 
CCTTCCCGGA ACAGGGGCTA ATCRGAAGCC AGCATGGGGG GCTGGCATCC AGGATGGAGC TGCTTCAGCC 194 60 
TCCACATGCG TGTTCATACA GATGGTGCAC AGAAACGCAG TGTACCTGTG CACACACAGA CACGCAGCTA 19530 
CTCGCACACA CAAGCACACA CACAGACATG CATGCATGCA TCCGTGTGTG TGCACCTGTG CCCATGAGGA 19600 
50 Aii.CCCATGCA TGTGCATTCA TGCACGCACA CAGGCACCGG TGGGCCCATG CCCACACCCA CGAGCACCGT 19670 

CTGATTAGGA GGCCTTTCCT CTGACGCTGT CCGCCATCCT CTCAGGTTTC ACGCATGTGT GCTGCAGCTC 1974 0 
CCATTTCATC AGCAAGTTTG GAAGAACCCC ACATTTTTCC TGCGCGTCAT CTCTGACACG GCCTCCCTCT 19810 
GCTACTCCAT CCTGAAAGCC AAGAACGCAG GTATGTGCAG GTGCCTGGCC TCAGTGGCAG CAGTGCCTGC 1988 0 
CTGCTGGTGT TAGTGTGTCA GGAGACTGAG TGAATCTGGG CTTAGGAAGT TCTTACCCCT TTTCGCATCA 19950 
55 GGPAGTGGTT TAACCCAACC ACTGTCAGGC TCGTCTGCCC GCCCTCTCGT GGGGTGAGCA GAGCACCTGA 2 002 0 

TGGAAGGGAC AGGAGCTGTC TGGGAGCTGC CATCCTTCCC ACCTTGCTCT GCCTGGGGAA GCGCTGGGGG 2 0090 
GCCTGGTCTC TCCTGTTTGC CCCAT3GTGG GATTTGGGGG GCCTGGCCTC TCCTGTTTGC GCTGTGGTGG 20160 
GATTGGGCTG TCTCCCGTCC ATGGCACTTA GGGCCCTTG7 GCAAACCCAG GCCAAGGGCT TAGGAGGAGG 2 02 30 
CCAGGCCCAG GCTACCCCAC CCCTCTCAGG AGCAGAGGCC GCGTATCACC ACGACAGAGC CCCGCGCCGT 20300 
60 CCTCTGCTTC CCAGTCACCG TCGTCTGCCC CTGGACACTT TGTCCAGCAT CAGGGAGGTT TCTGATCCGT 20370 

CTGAAATTCA AGCCATGTCG AACCTGCGGT CCTGAGCTTA ACAGCTTCTA CTTTCTGTTC TTTCTGTGTT 20440 
GT3GAAATTT CACCTGG2.GA AGCCC-AA3AA AACATTTCT3 TCGTGACTCC TGCGGTGCTT GGGTCGGGAC 20510 
AGCCAGAGAT GGAGCC^CCC CGCAG 
TG3GCTGGGC CTGTGACTCC TCAGC- 
65 GGCCCTCTGC CCTCCG-.GGC CGTGC.-.C7GG CTGTGCCACC AAGCATTCCT GCTCAAGCTG ACTCGACACC 2 072 0 

GTGTCACCTA CGTGCC-CTC CTGG3C7CAC TCAG3ACAGG C^AGTGTGGG TGGAGGCCAG TGCGGGCCCC 20790 
AC3TGCCCAG GGGTCA7CCT TGA.-C3CCCT GTGT3GGGC3 AGCA3CCTCA GATGCTGCTG AAGTGCAGAC 2 0860 
GCCCCCGGGC CTGACC3TGG GGGCCTG3AG CCACGCTGGC AGCCCTATGT GATTAAACGC TGGTGTCCCC 20930 
AG33CACGGA GCCTGGCAGG GTCC3C-ACT TCTT3AACCC CTGCTTCCCA TCTCAGGGGC GATGGCTCCC 21000 
70 CA33CTTGGG AGCCTT3TGA CCCCTG-CCT GTGT3CTCTC ACAGCCTCTT CCCTGGCTGC TGCCCTGAGC 21070 

TC3TGGGGTC CTGAGCi=.-GT TCTCTC^CCG CCCC3C3GCT CCAC3GTCAC TGGGCTGCCT GTCTGCTCGC 21140 

: CAGrCAGGGC CACGAGGTGC AGGCCCTGCC 21210 
; CACCTCTGGC CTCTTCTGGA ACGGAGTCTG 21280 
' CCC3GGGACG ACGCTGACTG CCCTGGAGGC 21350 

' ACGTCCCAGG GAGGGAGGGG CGGCCCACAC 214 90 
; TTTGGCCGAG GCCTGCATGT CCGGCTGAAG 215 60 



ATTTTGGCCC CGCAGCCC^G ACGCAGITGA GTCGC 
GC7GAGAGCA GACACC=-3C^ GC3CT3:CAC GCCG; 
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CACTTCCCCfl CAGGCTGGCG CTCGGCTCCA CCCCAGGGCC 
CTCCCCACAT AGGAATAGTC CATCCCCAGA TTCGCCATTG 
CCACCCCCAC CATCCAGGTG GAGACCCTGA GAAGGACCCT 
GTGTGCCCTG TACACAGGCG AGGACCCTGC ACCTGGATGG 
GCTGTGGGAG TAAAATACTG AATATATGAG TTTTTCAGTT 
GTGCACTGCA TAGACACCAC TGTATGCAAT TACAGAAGCC 
GCCCATGGCC TGGCTGTGCA TTTACGGAAG TCTATGAGTG 
CCTGGCTGGG CCTGGGAGGT TTCTGATGCT GTGAGGCAGG 
GAGCCCCCAC CCTGGAAGAC ATAACAGTAA GTCCAGGCCC 
TTGGGCGGCG GGGATGATGG AGGGCCTGGC CAGGGTGGCA 
GGGGTGATGG GGGGGGCTGG TCTGGGTGGC GGGGAAGATG 
GCCTCCCACC TGCAGCCGTG GATCCGGATG TGCTTCCCTG 
GGAGGTGGGG GGCAGGGGCA TGACACCATC CTGTATAAAA 
TCAGGTTGAA AGTCACATTC CGCCTCTGGC CATTCTCTTA 
GTGGGTAGGG TGGGGCAGTG GAGGGTGTGG ACACAGGAGG 
TCCTCTTATC ATCTCCCAGT CTCATCTCTC ATCCTCTTAT 
TCTCCCAGTC TCATCTGTCA TCCTCTTACC ATCTCCCAGT 
CATCCAGACT TACCTCCCAG GGCGGGTGCC AGGCTCGCAG 
GAAGGAACTG GAAGGATTGC AGAGAACAGG AGGGGCGGCT 
AGCCCCTCCT CAGAAGTTGG CTTGGGCCAC ACGAAACCGA 
CAGCAGGTCC CTGGTGGGGC CTTATGGTAT GGCCGGGTCC 
TTTGAGTGCA GCCCGGACGT GCCTGGTGTC GGGGTGGGGG 
TGCTGCTGCT TCAGAGAATG TCTGAGTGAC CGAGCCTAAT 
GTCGTAAATG CACTCTGGTG CCTGGAGCCC CCGTATAGGA 
GGCCTGGGGG CGCCTTTGCC CTGCAAACTG GAAGGGAGCG 
GTGAGAGGTT GGACAGAACA GGGCGGGGAC TTCCCAGGAG 
TGAATCACAG ACCAACaGGT CAGGCCATTG TTCAGCTATC 
CTCCGGGTGT TTTTTGTTGA AATTTTACTC AGGATTACTT 
AAAAGGTATT TGCTTTGATA TGGCTTAACT CACTAAGCAC 
TATTATTATT ATTAGAGATG GTGTCTACTC TGTCACCCAG 
GCTGTAGCCG CAAACCCCCA GGCTCAAGTG ATCCTCCGGC 
TGTGAGCCAC TGCCCTTGCC TGGCACTTTT AAAAACCACT 
CTGTCATCCC AGTAGTTTGG GAAGCCGAGG CAGAAGGATT 
GGTAACATAG GGAGACCCCA TCTCTACAAA AAATGCAAAA 
AGTCCCAGCT GCTCGGGAGG CTGAGTGGGA GGATCGCTTG 
TGATTGTACC ATCGCACTCC AGCCTGGGCA ACAGAGTGAG 
AAGGAGAAGG AGAAGAGAAG AAGAAGGAAG AAGGAAAGAG 
AAGGAGGCCT GCTAGGTGCT AGGTAGACTG TCAAATCTCA 
GGAAAGAAAA ACCCCAGCTC TTTGGACTTC CTTAGGCCTG 
ACAAGCGTGT ATGGAGCGAG TGAGTTCAAA GCAGAAAGGG 
GTGACACCAG CCAGGACCCC TGAAAGGGAG TGGTTGTTTT 
TGCACCTGCT GTAACCGTCG ATGTTGGTGC CAGGTGCCCA 
CAAACTTTGG TGGGTTTCAG AAGCCCCAGG CACTTGTGGC 
CCCACGTCCT TCTCCTGGAA CCTGTGAATG TGTCACCCGC 
AATCACGGCT GCCAGTCAGC CGATCTTAAG GTCATCCTGG 
GGTCCCTAGA AGTGAGAGAG GGAGGCAGGG GAGAGTCAGA 
GCTGGCTTTG AGATGGAGGA GGGGGTCCCC AGCCAAGGAA 
AGCAATCCTC CCCGGTCCTG AGGGCACACG GCCCTGCCCA 
TCAGCTTTCC GGCCTCCAGA GCTGTAAGAT GATGCGTTTG 
ACAGCAGCAA ATGGAATAGC AGTACAGGGA AATGAATACA 
CCCCTGGG 



AGCTTTTCCT CACCAGGAGC CCGGCTTCCA 2170 0 
TTCACCCCTC GCCCTGCCCT CCTTTGCCTT 21770 
GGGAGCTCTG GGAATTTGGA GTGACCAAAG 2184 0 
GGGTCCCTGT GGGTCAAATT GGGGGGAGGT 21910 
TTGAAAAAAA TCTCATGTTT GAATCCTAAT 21980 
TGTGAGTGAA CGGGGTGGTG GTCAGTGCGG 22050 
AATGGGGTTG TGGTCAGTGC GGGCCCATGG 22120 
AGGGGAAGGA GGGTAGGGGA TAGACAGTGG 22190 
GAAGGGCAGC AGGGATGCTG GGGGCCCAGC 222 60 
GGGATGATGG GGGGCCCAGC TGGGGTGGCA 22330 
GGGAAGCCTG GCTGGGCCCC CTCCTCCCCT 22400 
GTGCACATCC TCTGGGCCAT CAGCTTTCAT 2247 0 
TCCAGGATTC CTCCTCCTGA ACGCCCCAAC 22540 
AGAGTAGACC AGGATTCTGA TCTCTGAAGG 22610 
CTTCAGGGTG GGGCTGGTGA TGCTCTCTCA 22 680 
CATCTCCCAG TCTCATCTGT CTTCCTCTTA 22750 
CTCATCTCTT ATCCTCTTAT CTCCTAGTCT 22820 
TGGAGCTGGA CATACGTCCT TCCTCAGGCA 22890 
CAGAGGGACG CAGTCTTGGG GTGAAGAAAC 22960 
GGGCCCTGCG TGAGTGGCTC CAGAGCCTTC 23030 
TACTGAGTGC ACCTTGGACA GGGCTTCTGG 2 3100 
CTTATGGCCA CTGGATATGG CGTCATTTAT 23170 
GTGTATGGTG GGCCCAAGTC CACAGACTGT 23240 
GCTGTGAGGA AGGAGGGGCT CTTGGCAGCC 23310 
GCCCCGGGCG CCGTGGGCGG ACGACCTCAA 23380 
CAGAGGCCGC TGCTCAGGCA CACCTGGGTT 23450 
CATCTTCTAC AAAGCTCCAG ATTCCTGTTT 23520 
ATATTTTTTG CTAAAGTATT AGACCCTTAA 2 35 90 
CTACTTTATT TGTCTGTTTT TATTTATTAT 23660 
GTTGTTAGTG CAGTGGCACA GTCATGGCTC 2 3730 
CTCAGCTTCC CAGAGTGCTG GGATTACAGG 2 3800 
ATGTAAGGTC AGGTCCAGTG GCTTCCACAC 23870 
GTCTGAGGCC AGGAGTTTGA GACCAGCATG 23940 
AGTTATCCGG GCGTGGGGTC CAGCATCTGT 2 4010 
AGCCCGGGAG GTCATGGCTG CAGTGAGCTG 2 4 080 
ACCCTGTCTC AAAAAAAAAA AAAAAAAAAG 24150 
AAGAAGAAGG AAGAAGGAAG AAAGAAGGAG 2 4 220 
GAGCAAAATG AAAATAACAA AGTTTTAAAG 24 290 
AACTTCATCT CAAGCAGCTT CCTTCCACAG 24 360 
AGGAGAAGCA GGCAAGGGTG GAGGGTGTGG 2 4 430 
CCTGCCTCAG CCCCACGCTC CTGCCGGTCC 2 4 500 
CCTGGGAAGG ATGCTGTGCA GGGGGCTTGC 24570 
AGGCACAATT ACAGCCCCTC CCCAAJiGATG 2 4 640 
AAGGCAGAGG CTGGTGAAGG CTGCAGGTGG 24710 
ATTATCTGGT GGGCCTGATA TGGCCACAAG 24 780 
GAGGGGACGT GAGAAGGACC ACTGGCCACT 24850 
TGGGGGCAGC CGCTCCATGC TGGAAAAGCA 24920 
CGCCTCGATT TCAGGCCAGT GGGACCTGTT 24990 
TGTTCAGCCA CTAAGCTGCA GTGATTCGTC 2 50 60 
GGGACAGTTC TCAGAGTGAC TCTCAGCCCA 25130 



Example 5 



Comparison of the above-described genomic hTC sequence and the sequence of the 
hTC cDNA (Fig. 6; corresponding to SEQ ID NO 2) made it possible to elucidate the 
exon-intron structure of the hTC gene. The genomic organization of the hTC gene is 
illustrated diagrammatically m Fig. 7. The codmg region of the hTC gene is 
composed of 16 exons which var\- in size between 62 bp and 1354 bp (see Table 1) 
Exon 1 contains the translation start codon ATG. The translation stop codon TGA 
and the 3'-untranslated region lie on exon 16 (Fig. 8). No possible polyadenylation 
signal (AATAAA) was found either in exon 16 or m the 3195 bp of the following 
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3 '-flanking region. The exon-intron transitions were determined on the basis of the 
consensus sequence 

5'-Exon Intron 3'-Exon 

5 Pre-mRNA A/C A G | G T A/G A . . . N C A G | G 

Frequency (%) 70 60 80 lOO lOO 95 lo so lOO lOO 60 

and listed in Table 1. With the exception of the 5' splice site between exon 15 and 
intron 15, all the exon-intron transitions are in accord with the published (Shapiro 
10 and Senapathy, 1987) splice consensus sequence. The sizes of the introns are 
between 104 bp and 8616 bp. Since only part of intron 6 was isolated, it is not 
possible to determine the precise length of the hTC gene. Based on the part sequence 
of -4660 bp, which was obtained from intron 6, the minimum size of the hTERT 
gene is 37 kb. 
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Introns 1-5 and the 5' region of intron 6, are contained in contig 1: 
Intron 1: bp 1 1493-1 1596 (SEQ ED NO 4); 
Intron 2: bp 12951-21566 (SEQ ID NO 5); 
Intron 3: bp 21763-23851 (SEQ ID NO 6); 
5 Intron 4: bp 24033-24719 (SEQ ID NO 7); 
Intron 5: bp 24900-25393 (SEQ ID NO 8); 
5' region of intron 6: bp 25550-26414 (SEQ ID NO 9). 

The 3' region of intron 6, and introns 7-15, are located in contig 2 at the following 
10 positions: 

3' region of intron 6: bp 1-3782 (SEQ ID NO 10); 

Intron 7: bp 3879-4858 (SEQ ID NO 1 1); 

hitron 8: bp 4945-7429 (SEQ ID NO 12); 

Intron 9: bp 7544-9527 (SEQ ID NO 13); 
15 hitron 10: bp 9600-1 1470 (SEQ ID NO 14); 

hitron 1 1 : bp 1 1660-15460 (SEQ ID NO 15; 

hitron 12: bp 15588-16467 (SEQ ID NO 16); 

hitron 13: bp 16530-19715 (SEQ ID NO 17); 

hitron 14: 19841-20621 (SEQ ID NO 18); 
20 hitron 15: 20760-21295 (SEQ ID NO 19). 

The 3'-untranscribed region is also located in contig 2 at position 21960-25138 (SEQ 
ID NO 20). 

25 The individual sequences of the abovementioned introns are as follows: 
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Intron 1 (SEQ ID NO 4) 

GTGGGCCTCCCCGGGGTCGGCGTCCGGCTGGGGTTGAGGGCGGCCGGGGGGAACCAGCGACATGCGGAGAGCAGCGCAGG 
CGACTCAGGGCGCTTCCCCCGCAG 

Intron 2 (SEQ ID NO 5) 

GTGAGGAGGTGGTGGCCGTCGAGGGCCCAGGCCCCAGAGCTGAATGCAGTAGGGGCTCAGAAAAGGGGGCAGGCAGAGCC 
CTGGTCCTCCTGTCTCCATCGTCACGTGGGCACACGTGGCTTTTCGCTCAGGACGTCGAGTGGACACGGTGATCTCTGCC 
TCTGCTCTCCCTCCTGTCCAGTTTGCATAAACTTACGAGGTTCACCTTCACGTTTTGATGGACACGCGGTTTCCAGGCGC 
CGAGGCCAGAGCAGTGAACAGAGGAGGCTGGGCGCGGCAGTGGAGCCGGGTTGCCGGCAATGGGGAGAAGTGTCTGGAAG 
CACAGACGCTCTGGCGAGGGTGCCTGCAGGTTACCTATAATCCTCTTCGCAATTTCAAGGGTGGGAATGAGAGGTGGGGA 
CGAGAA.CCCCCTCTTCCTGGGGGTGGGAGGTAAGGGTTTTGCAGGTGCACGTGGTCAGCC7U\TATGCAGGTTTGTGTTTA 
AGATTTAATTGTGTGTTGACGGCCAGGTGCGGTGGCTCACGCCGGTAATCCCAGCACTTTGGGAAGCTGAGGCAGGTGGA 
TCACCTGAGGTCAGGAGTTTGAGACCAGCCTGACCAACATGGTGAAACCCTATCTGTACTAAAAATACAAAAATTAGCTG 
GGCATGGTGGTGTGTGCCTGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATCACTTGAACCCAGGAGGCGGAGGC 
TGCAGTGAGCTGAGATTGTGCCATTGTACTCCAGCCTGGGCGACAAGAGTGAAACTCTGTCTTTAAAAAAAAAAAGTGTT 
CGTTGATTGTGCCAGGACAGGGTAGAGGGAGGGAGATAAGACTGTTCTCCAGCACAGATCCTGGTCCCATCTTTAGGTAT 
GAAGAGGGCCACATGGGAGCAGAGGACAGCAGATGGCTCCACCTGCTGAGGAAGGGACAGTGTTTGTGGGTGTTCAGGGG 
ATGGTGCTGCTGGGCCCTGCCGTGTCCCCACCCTGTTTTTCTGGATTTGATGTTGAGGAACCTCCGCTCCAGCCCCCTTT 
TGGCTCCCAGTGCTCCCAGGCCCTACCGTGGCAGCTAGAAGAAGTCCCGATTTCACCCCCTCCCCACAAACTCCCAAGAC 
ATGTAAGACTTCCGGCCATGCAGACAAGGAGGGTGACCTTCTTGGGGCTCTTTTTTTTCTTTTTTTCTTTTTATGGTGGC 
AAAAGTCATATAACATGAGATTGGCACTCCTAJ\CACCGTTTTCTGTGTACAGTGCAGAATTGCTAACTCGGCGGTGTTTA 
CAGCAGGTTGCTTGAAATGCTGCGTCTTGCGTGACTGGAAGTCCCTACCCATCGAACGGCAGCTGCCTCACACCTGCTGC 
GGCTCAGGTGGACCACGCCGAGTCAGATAAGCGTCATGCAACCCAGTTTTGCTTTTTGTGCTCCAGCTTCCTTCGTTGAG 
GAGAGTTTGAGTTCTCTGATCAGGACTCTGCCTGTCATTGCTGTTCTCTGACTTCAGATGAGGTCACAATCTGCCCCTGG 
CTTATGCAGGGAGTGAGGCGTGGTCCCCGGGTGTCCCTGTCACGTGCAGGGTGAGTGAGGCGTTGCCCCCAGGTGTCCCT 
GTCACGTGTAGGGTGAGTGAGGCGCGGCCCCCGGGTGTCCCTGTCCCGTGCAGCGTGATTGAGGTGTGGCCCCCGGGTGT 
CCCTGTCACGTGTAGGGTGAGTGAGGCGCCATCCCCGGGTGTCCCTGTCACGTGTAGGGTGAGTGAGGCGTGGTCCCCGG 
GTGTCCCTGTCCCGTGCAGGGTGAGTGAGGCACTGTCCCCGGGTGTCCCTGTCACGTGCAGGGTGAGTGAGGCGCGGTCC 
CCGGGTGTCCCTCTCAGGTGTAGGGTGAGTGAGGCGCGGCCCCAGGGTGTCCCTGTCACGTGTAGGGTGAGTGAGGCACC 
GTCCCTGGGTGTCCCTCCCAGGTATAGGGTGAGTGAGGCACTGTCCCCGGGTGTCCCTGTCACGTGCAGGGTGAGTGAGG 
CGCGGCCCCCGGGTGTCCCTCTCAGGTGCAGGGTGAGTGAGGCGCTGTCCCTGGGTGTCCCTGTCTCGTGTAGGGTGAGT 
GAGGCTCTGTCCCCAGGTGTCCTTGGCGTTTGCTCACTTGAGCTTGCTCCTGAATGTTTGCTCTTTCTATAGCCACAGCT 
GCGCCGGTTGCCCATTGCCTGGGTAGATGGTGCAGGCGCAGTGCTGGTCCCCAAGCCTATCTTTTCTGATGCTCGGCTCT 
TCTTGGTCACCTCTCCGTTCCATTTTGCTACGGGGACACGGGACTGCAGGCTCTCGCCTCCCGCGTGCCAGGCACTGCAG 
CCACAGCTTCAGGTCCGCTTGCCTCTGTTGGGCCTGGCTTGCTCACCACGTGCCCGCCACATGCATGCTGCCAATACTCC 
TCTCCCAGCTTGTCTCATGCCGAGGCTGGACTC7GGGCTGCCTGTGTCTGCTGCCACGTGTTGCTGGAGACATCCCAGAA 
AGGGTTCTCTGTGCCCTGAAGGAAAGCAAGTCACCCCAGCCCCCTCACTTGTCCTGTTTTCTCCCAAGCTGCCCCTCTGC 
TTGGCCCCCTTGGGTGGGTGGCAACGCTTGTCACCTTATTCTGGGCACCTGCCGCTCATTGCTTAGGCTGGGCTCTGCCT 
CCAGTCGCCCCCTCACATGGATTGACGTCCAGCCACAGGTTGGAGT3TCTCTGTCTGTCTCCTGCTCTGAGACCCACGTG 
GAGGGCCGGTGTCTCCGCCAGCCTTCGTCAGACTTCCGTCTTGGGTCTTAGTTTTG.i-A-TTTCACTGATTTACCT GTGACG 



TTTCTATCTCTCCATTGTATGCTT 
CCTCTAAGTGCTGCCTTACCTGCA 
ATACTTC.^\AAGTGTTAATACTTCT 
AATCATTTTGATATCAGTGACTTT 
GTTTATGTTCA.2^GATATGTAGAGT 



'ttcttgg: 



TTAAGT.: 
'AAGTAT' 



■tattctttcattccttttctagc':tcttagtttagtcatg:ctttc 
'tgatgtgaagt.^.-.tctcaacatcagccactttcaagtgttcttaaa 

'CTTATTCTGTGATTTTTTTCTTTGTGCACGCTGTGTTTTG^CGTGA 
'TTAGCTTATTCT 3TGATTTCTTT3AGCAGTGAGTTATTTGAACACT 
^TC-AAGATACGTAGAGTATTT r aagttatcatt ttattattgatttctaa. ':tcagt 
tgtgtagtggtctgtataataccaattatttga_-.gtttgcggag::crtgctttgtg?^tctagtgtgtgcatggtrtccag 
aactgtccattgtaajxtttgagatcctgtc^'-atagtgggcatgcat'jttcactatat :cagcttattaaggtccagtgca 
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AAGCTTCTGTCTCCTTCTAGATGCATGAAATTCCAAGAA.GGAGGCCATAGTCCCTCACCTGGGGGATGGGTCTGTTCATT 
TCTTCTCGTTTGGTAGCATTTATGTGAGGCATTGTTAGGTGCATGCACGTGGTAGAATTTTTATCTTCCTGATGAGTGAA 
TCTTTTGGAGACTTCTATGTCTCTAGTAATCTAGTAATTCTTTTTTTAAATTGCTCTTAGTACTGCCACACTGGGCTTCT 

5 GAGTCTTGGTCTGTCGCCCAGGGTGAGTGCAGTGGTGTGATCACAGGTCAGTGTAACTTTTACCTTCTGGCCTGAGCCGT 
CCTCTCACCTCAGCCTCCTGAGTAGCTGGAACTGCAGACACGCACCGCTACACCTGGCTAATTTTTAAATTTTTTCTGGA 
GACAGGGTCTTGCTGTGTTGCCCAGGCTGGTCTCAAACTCTTGGACTCAAGGGATCCATCTACCTCGGCTTCCCAAAGTG 
CTGAATTACAGGCATGAGCCACCATGTCTGGCCTAATTTTCAACACTTTTATATTCTTATAGTGTGGGTATGTCCTGTTA 
ACAGCATGTAGGTGAATTTCCAATCCAGTCTGACAGTCGTTGTTTAACTGGATAACCTGATTTATTTTCATTTTTTTGTC 

10 ACTAGAGACCCGCCTGGTGCACTCTGATTCTCCACTTGCCTGTTGCATGTCCTCGTTCCCTTGTTTCTCACCACCTCTTG 
GGTTGCCATGTGCGTTTCCTGCCGAGTGTGTGTTGATCCTCTCGTTGCCTCCTGGTCACTGGGCATTTGCTTTTATTTCT 
CTTTGCTTAGTGTTACCCCCTCATCTTTTTATTGTCGTTGTTTGCTTTTGTTTATTGAGACAGTCTCACTCTGTCACCCA 
GGCTGGAGTGTAATGGCACAATCTCGGCTCACTGCAACCTCTGCCTCCTCGGTTCAAGCAGTTCTCATTCCTCAACCTCA 
TGAGTACCTGGGATTACAGGCGCCCACCACCACGCCTGGCTAATTTTTGTATTTTTAGTAGAGATAGGCTTTCACCATGT 

15 TGGCCAGGCTGGTCTCAAACTCCTGACCTCAAGTGATCTGCCCGCCTTGGCCTCCCACAGTGCTGGGATTACAGGTGCAA 
GCCACCGTGCCCGGCATACCTTGATCTTTTAAAATGAAGTCTGAAACATTGCTACCCTTGTCCTGAGCAATAAGACCCTT 
AGTGTATTTTAGCTCTGGCCACCCCCCAGCCTGTGTGCTGTTTTCCCTGCTGACTTAGTTCTATCTCAGGCATCTTGACA 
CCCCCACAAGCTAAGCATTATTAATATTGTTTTCCGTGTTGAGTGTTTCTGTAGCTTTGCCCCCGCCCTGCTTTTCCTCC 
TTTGTTCCCCGTCTGTCTTCTGTCTCAGGCCCGCCGTCTGGGGTCCCCTTCCTTGTCCTTTGCGTGGTTCTTCTGTCTTG 

20 TTATTGCTGGTAAACCCCAGCTTTACCTGTGCTGGCCTCCATGGCATCTAGCGACGTCCGGGGACCTCTGCTTATGATGC 
ACAGATGAAGATGTGGAGACTCACGAGGAGGGCGGTCATCTTGGCCCGTGAGTGTCTGGAGCACCACGTGGCCAGCGTTC 
CTTAGCCAGTGAGTGACAGCAACGTCCGCTCGGCCTGGGTTCAGCCTGGAAAACCCCAGGCATGTCGGGGTCTGGTGGCT 
CCGCGGTGTCGAGTTTGAAATCGCGCAAACCTGCGGTGTGGCGCCAGCTCTGACGGTGCTGCCTGGCGGGGGAGTGTCTG 
CTTCCTCCCTTCTGCTTGGGAACCAGGACAAAGGATGAGGCTCCGAGCCGTTGTCGCCCAACAGGAGCATGACGTGAGCC 

25 ATGTGGATAATTTTAAAATTTCTAGGCTGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCGGG 
TGGATCACGAGGTCAGGAGGTCGAGACCATCCTGGCCAACATGATGAAACCCCATCTGTACTAAAAACACAAAAATTAGC 
■TGGGCGTGGTGGCGGGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGTTGGAA 
GTTGCAGTGAGCCGACATTGCACCACTGCACTCCAGCCTGGCAACACAGCGAGACTCTGTCTCAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAATTCTAGTAGCCACATTAAAAAAGTAAAAAAGAAAAGGTGAAATTAATGTAATAATAGATTTTACTGAA 

30 GCCCAGCATGTCCACACCTCATCATTTTAGGGTGTTATTGGTGGGAGCATCACTCACAGGACATTTGACATTTTTTGAGC 
TTTGTCTGCGGGATCCCGTGTGTAGGTCCCGTGCGTGGCCATCTCGGCCTGGACCTGCTGGGCTTCCCATGGCCATGGCT 
GTTGTACCAGATGGTGCAGGTCCGGGATGAGGTCGCCAGGCCCTCAGTGAGCTGGATGTGCAGTGTCCGGATGGTGCACG 
TCTGGGATGAGGTCGCCAGGCCCTGCTGTGAGCTGGATGTGTGGTGTCTGGATGGTGCAGGTCAGGGGTGAGGTCTCCAG 
GCCCTCGGTGAGCTGGAGGTATGGAGTCCGGATGATGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCTGTGAGCTGGATG 

35 TGTGGTGTCTGGATGGTGCAGGTCAGGGGTGAGGTCTCCAGGCCCTCGGTAAGCTGGAGGTATGGAGTCCGGATGATGCA 
GGTCCGGGGTGAGGTCGCCAGGCCCTGCTGTGAGCTGGATGTGTGGTGTCTGGATGGTGCAGGTCTGGGGTGAGGTCACC 
AGGCCCTGCGGTGAGCTGGGTGTGCGGTGTCTGGATGGTGCAGGTCTGGAGTGAGGTCGCCAGACGGTGCCAGACCATGC 
GGTGAGCTGGATATGCGGTGTCCGGATGGTGCAGGTCTGGGGTGAGGTTGCCAGGCCCTGCTGTGAGTTGGATGTGGGGT 
GTCCGGATGCTGCAGGTCCGGTGTGAGGTCACCAGGCCCTGCTGTGAGCTGGATGTGTGGTGTCTGGATGGTGCAGGTCT 

40 GGGGTGAAGGTCGCCAGGCCCCTGCTTGTGAGCTGGATGTGTGGTGTCTGGATGGTGCAGGTCTGGAGTGAGGTCGCCAG 
GCCCTCGGTGAGCTGGATGTGCAGTGTCCAGATGGTGCAGGTCCGGGGTGAGGTCGCCAGACCCTGCGGTGAGCTGGATG 
TGCGGTGTCTGGATGGTGCAGGTCTGGAGTGAGGTCGCCAGGCCCTCGGTGAGCTGGATGTATGGAGTCCGGATGGTGCC 
GGTCCGGGGTGAGGTCGCCAGACCCTGCTGTGAGCTGGATGTGCGGTGTCTGGATGGTACAGGTCTGGAGTGAGGTCGCC 
AGACCCTGCTGTGAGCTGGATATGCGGTGTCCGGATGGTGCAGGTCAGGGGTGAGGTCTCCAGGCCCTCGGTGAGCTGGA 

45 GGTATGGAGTCCGGATGATGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCTGTGAACTGGATGTGCGGCGTCTGGATGGT 
GCAGGTCTGGGGTGTGGTCGCCAGGCCCTCGGTGAGCTGGAGGTATGGAGTCCGGATGATGCAGGTCCGGGGTGAGGTCG 
CCAGGCCCTGCTGTGAGCTGGATGTGCGGCGTCTGGATGG7GCAGGTCTGGGGTGTGGTCGCCAGGCCCTCGGTGAGCTG 
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GAGGTATGGAGTCCGGATGATGCAGGTCCGGGGTGAGGTTGCCAGGCCCTGCTGTGAGCTGGATGTGCTGTATCCGGATG 
GTGCAGTCCGGGGTGAGGTCGCCAGGCCCTGCTGTGAGCTGGATGTGCTGTATCCGGATGGTGCAGGTCTGGGGTGAGGT 
CACCAGGCCCTGCGGTGAGCTGGTTGTGCGGTGTCCGGTTGCTGCAGGTCCGGGGTGAGTTCGCCAGGCCCTCGGTGAGC 
TGGATGTGCGGTGTCCCCGTGTCCGGATGGTGCAGGTCCAGGGTGAGGTCGCTAGGCCCTTGGTGGGCTGGATGTGCCGT 
5 GTCCGGATGGTGCAGGTCTGGGGTGAGGTCGCCAGGCCTTTGGTGAGCTGGATGTGCGGTGTCTGCATGGTGCAGGTCTG 
GGGTGAGGTCGCCAGGCCCTTGGTGGGCTGGATGTGTGGTGTCCGGATGGTGCAGGTCCGGCGTGAGGTCGCCAGGCCCT 
GCTGTGAGCTGGATGTGCGGTGTCTGGATGGTGCAGGTCCGGGGTGAGGTAGCCAAGGCCTTCGGTGAGCTGGATGTGGG 
GTGTCCGGATGGTGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCGGTTAGCTGGATATGCGGTGTCCGGATGGTGCAGGT 
CCGGGGTGAGGTCACCAGGCCCTGCGGTTAGCTGGATGTGCGGTGTCTGGATGGTGCAGGTCCGGGGTGAGGTCGCCAGG 

10 CCCTGCTGTGAGCTGGATGTGCTGTATCCGGATGGTGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCAGTGAGCTGGATG 
TGCTGTATCCGGATGGTGCAGGTCTGGCGTGAGGTCGCCAGGCCCTGCGGTTAGCTGGATATGCGGTGTCGGATGGTGCA 
GGTCCGGGGTGAGGTCACCAGGCCCTGCGGTTAGCTGGATGTGCGGTGTCCGGATGGTGCAGGTCTGGGGTGAGGTCGCC 
AGGCCCTGCTGTGAGCTGGATGTGCTGTATCCGGATGGTGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCGGTGAGCTGG 
ATGTGCTGTATCCGGATGGTGCAGGTCTGGCGTGAGGTCGCCAGGCCCTGCGGTGAGCTGGATGTGCAGTGTACGGATGG 

15 TGCAGGTCCGGGGTGAGGTCGCCAGGCCCTGCGGTGGGCTGTATGTGTGTTGTCTGGATGGTGCAGGTCCGGGGTGAGTT 
CGCCAGGCCCTGCGGTGAGCTGGATGTGTGGTGTCTGGATGCTGCAGGTCCGGGGTGAGTTCGCCAGGCCCTCGGTGAGC 
TGGATATGCGGTGTCCCCGTGTCCGAATGGTGCAGGTCCAGGGTGAGGTCGCCAGGCCCTTGGTGGGCTGGATGTGCCGT 
GTCCGGATGGTGCAGGTCTGGGGTGAGGTCGCCAGGCCCTTGGTGAGCTGGATGTGCGGTGTCCGGATGGTGCAGGTCCG 
GGGTGAGGTCACCAGGCCCTCGGTGATCTGGATGTGGCATGTCCTTCTCGTTTAAG 

20 

Intron 3 (SEQ ID NO 6) 

GTACTGTATCCCCACGCCAGGCCTCTGCTTCTCGAAGTCCTGGAACACCAGCCCGGCCTCAGCATGCGCCTGTCTCCACT 
TGCCTGTGCTTCCCTGGCTGTGCAGCTCTGGGCTGGGAGCCAGGGGCCCCGTCACAGGCCTGGTCCAAGTGGATTCTGTG 
CAAGGCTCTGACTGCCTGGAGCTCACGTTCTCTTACTTGTAAAATCAGGAGTTTGTGCCAAGTGGTCTCTAGGGTTTGTA 

25 AAGCAGAAGGGATTTAAATTAGATGGAAACACTACCACTAGCCTCCTTGCCTTTCCCTGGGATGTGGGTCTGATTCTCTC 
TCTCTTTTTTTTTTCTTTTTTGAGATGGAGTCTCACTCTGTTGCCCAGGCTGGAGTGCAGTGGCATAATCTTGGCTCACT 
GCAACCTCCACCTCCTGGGTTTAAGCGATTCACCAGCCTCAGCCTCCTAAGTAGCTGGGATTACAGGCACCTGCCACCAC 
GCCTGGCTAATTTTTGTACTTTTAGGAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCATGACCTCAGG 
TGATCCACCCACCTTGGCCTCCCAAAGTGCTGGGTTTACAGGCTAAGCCACCGTGCCCAGCCCCCGATTCTCTTTTAATT 

30 CATGCTGTTCTGTATGAATCTTCAATCTATTGGATTTAGGTCATGAGAGGATAAAATCCCACCCACTTGGCGACTCACTG 
CAGGGAGCACCTGTGCAGGGAGCACCTGGGGATAGGAGAGTTCCACCATGAGCTAACTTCTAGGTGGCTGCATTTGAATG 
GCTGTGAGATTTTGTCTGCAATGTTCGGCTGATGAGAGTGTGAGATTGTGACAGATTCAAGCTGGATTTGCATCAGTGAG 
GGACGGGAGCGCTGGTCTGGGAGATGCCAGCCTGGCTGAGCCCAGGCCATGGTATTAGCTTCTCCGTGTCCCGCCCAGGC 
TGACTGTGGAGGGCTTTAGTCAGAAGATCAGGGCTTCCCCAGCTCCCCTGCACACTCGAGTCCCTGGGGGGCCTTGTGAC 

35 ACCCCATGCCCCAAATCAGGATGTCTGCAGAGGGAGCTGGCAGCAGACCTCGTCAGAGGTAACACAGCCTCTGGGCTGGG 
GACCCCGACGTGGTGCTGGGGCCATTTCCTTGCATCTGGGGGAGGGTCAGGGCTTTCCCTGTGGGAACAAGTTAATACAC 
AATGCACCTTACTTAGACTTTACACGTATTTAATGGTGTGCGACCCAACATGGTCATTTGACCAGTATTTTGGAAAGAAT 
TTAATTGGGGTGACCGGAAGGAGCAGACAGACGTGGTGGTCCCCAAGPvTGCTCCTTGTCACTACTGGGACTGTTGTTCTG 

cctggggggcc7tggaggcccctcctccctggac,2igggtaccgtgccttttctactctgctgggcctgcggcctgcggtc 
40 agggcaccagctccggagcacccgcggccccag7gtccacggagtgccaggctgtcagccacagatgcccaggtccaggt 
gtggccgctccagcccccgtgcccccatgggtggt-:ttggggg.^j\.jvaggcca.^gggcagaggtgtcaggagactggtggg 
ctcatgagagctgattctgctccttggctgagct3ccctgagcagcctctcccgccctctccatctgaagggatgtggct 

CTTTCTACCTGG3GGTCCTGCCTGGGGCCAGCCTTGGGCTACCCCAGTGGCT3TACCAGAGGGACAGGCATCCTGTGTGG 
AGGGGCATGGGTTCACGTGGCCCCAGATGCAGCCTGGGACCAGGCTCCCTGGTGCTGATGGTGGGACAGTCACCCTGGGG 
45 GTTGACCGCCGGACTGGGCGTCCCCAGGGTTGACr^TAGGACCAGGTGTCCAXTGCCCTGCAAGTAGAGGGGCTCTCAG 
AGGCGTCTGGCT-^GCATGGGTGGACGTGGCCCCGGGCATGGCCTTCAGCGTGrSCTGCCGTGGGTGCCCTGAGCCCTCAC 

CTATTGCAG 
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Intron 4 (SEQ ID NO 7) 

GTGGCTGTGCTTTGGTTTAACTTCCTTTTTAAACAGAAGTGCGTTTGAGCCCCACATTTGGTATCAGCTTAGATGAAGGG 
CCCGGAGGAGGGGCCACGGGACACAGCCAGGGCCATGGCACGGCGCCAACCCATTTGTGCGCACAGTGAGGTGGCCGAGG 
TGCCGGTGCCTCCAGAAAAGCAGCGTGGGGGTGTAGGGGGAGCTCCTGGGGCAGGGACAGGCTCTGAGGACCACAAGAAG 
CAGCCGGGCCAGGGCCTGGATGCAGCACGGCCCGAGGTCCTGGATCCGTGTCCTGCTGTGGTGCGCAGCCTCCGTGCGCT 
TCCGCTTACGGGGCCCGGGGACCAGGCCACGACTGCCAGGAGCCCACCGGGCTCTGAGGATCCTGGACCTTGCCCCACGG 
CTCCTGCACCCCACCCCTGTGGCTGCGGTGGCTGCGGTGACCCCGTCATCTGAGGAGAGTGTGGGGTGAGGTGGACAGAG 
GTGTGGCATGAGGATCCCGTGTGCAACACACATGCGGCCAGGAACCCGTTTCAAACAGGGTCTGAGGAAGCTGGGAGGGG 
TTCTAGGTCCCGGGTCTGGGTGGCTGGGGACACTGGGGAGGGGCTGCTTCTCCCCTGGGTCCCTATGGTGGGGTGGGCAC 
TTGGCCGGATCCACTTTCCTGACTGTCTCCCATGCTGTCCCCGCCAG 

Intron 5 (SEQ ID NO 8) 

GTGGGTGCCGGGGACCCCCGTGAGCAGCCCTGCTGGACCTTGGGAGTGGCTGCCTGATTGGCACCTCATGTTGGGTGGAG 
GAGGTACTCCTGGGTGGGCCGCAGGGAGTGCAGGTGACCCTGTCACTGTTGAGGACACACCTGGCACCTAGGGTGGAGGC 
CTTCAGCCTTTCCTGCAGCACATGGGGCCGACTGTGCACCCTGACTGCCCGGGCTCCTATTCCCAAGGAGGGTCCCACTG 
GATTCCAGTTTCCGTCAGAGAAGGAACCGCAACGGCTCAGCCACCAGGCCCCGGTGCCTTGCACCCCAGTCCTGAGCCAG 
GGGTCTCCTGTCCTGAGGCTCAGAGAGGGGACACAGCCCGCCCTGCCCTTGGGGTCTGGAGTGGTGGGGGTCAGAGAGAG 
AGTGGGGGACACCGCCAGGCCAGGCCCTGAGGGCAGAGGTGATGTCTGAGTTTCTGCGTGGCCACTGTCAGTCTCCTCGC 
CTCCACTCACACAG 

5 ''-region intron 6 (SEQ ID NO 9) 

GTAAGGTTCACGTGTGATAGTCGTGTCCAGGATGTGTGTCTCTGGGATATGAATGTGTCTAGAATGCAGTCGTGTCTGTG 
ATGCGTTTCTGTGGTGGAGGTACTTCCATGATTTACACATCTGTGATATGCGTGTGTGGCACGTGTGTGTCGTGGTGCAT 
GTATCTGTGGCGTGCATATTTGTGGTGTGTGTGTGTGTGGCACGTGTGTGTCCATGGTGTGTGTGCCTGTGGTGTGCATG 
TGTGTGTGTCTGTGACACGTGCATGTTCATGCTGTGTGCTGCATGTCTGTGATGTGCCTATTTGTGGTGTGTGTGTGCAT 
GTGTCCGTGACATATGCGTGTCTATGGCATGGGTGTGTGTGGCCCCTTGGCCTTACTCCTTCCTCCTCCAGGCATGGTCC 
GCACCATTGTCCTCACGCTCTCGGGTGCTGGTTTGGGGAGCTCCACATTCAGGGTCCTCACTTCTAGCATGGGTGCCCCT 
GTCCTGTCACAGGGCTGGGCCTTGGAGACTGTAAGCCAGGTTTGAGAGGAGAGTAGGGATGCTGGTGGTACCTTCCTGGA 
CCCCTGGCACCCCCAGGACCCCAGTCTGGCCTATGCCGGCTCCATGAGATATAGGAAGGCTGATTCAGGCCTCGCTCCCC 
GGGACACACTCCTCCCAGAGCGGCCGGGGGCCTTGGGGCTCGGCAGGGGTGAAAGGGGCCCTGGGCTTGGGTTCCCACCC 
AGTGGTCATGAGCACGCTGGAGGGGTAAGCCCTCAAAGTCGTGCCAGGCCGGGGTGCAGAGGTGAAGAAGTATCCCTGGA 
GCTTCGGTCTGGGGAGAGGCACATGTGGAAACCCACAAGGACCTCTTTCTCTGACTTCTTGAGCT 

3 '-region intron 6 (SEQ ID NO 10) 

TGTGGGATTGGTTTTCATGTGTGGGATAGGTGGGGATCTGTGGGATTGGTTTTTATGAGTGGGGTAACACAGAGTTCAAG 
GCGAGCTTTCTTCCTGTAGTGGGTCTGCAGGTGCTCC-ACAGCTTTATTGAGGAGACCATATCTTCCTTTGAACTATGGT 
CGGGTTTATAGTAAGTCAGGGGTGTGGAGGCCTCCCCTGGGCTCCCrGTTCTGTTTCTTCCACTCTGC-GGTCGTGTGGTG 
CCTGCTGTGGTGTGTGGCCGGTGGGC 
CCGTCCTTGGAATTCCCCTGCGAGTT 
GTCTCGCTCTTTTTTGCCCAGGCTGG 
ATTCTCTTGCCTCAGCCTCCCA-AGrA 
AGACGAGGTTTCTCCATGTTGGCCAG 
GCTGG3ATGACAGGTGTGAACCGCCG 
CCTGCAGCCTTGGTGCTGACAACCTC 



caggcctccttgtgttcattggcctggatgtggccctggctacgct 
7ctttctttctttttttctttct7tttttttt77tttgataacaga 
ggcgtgatcttggctcactgcaa.cctgtgcttcctgagttcaagca 
tataggcgcccaccac::atgctgactaattttt' taattttagtag 
cga.ictcctgacctcaggtgatcctcccacctc-gcctcccaaagt 
cgagactcgcttcctgcagcttccgtgagatct xagcgatagctg 
ttctccaggtctcgctaggggtctttccatttc -tgactctcttca 
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CAGAAGAGTTTCACGTGTGCTGATTTCCCGGCTGTTTCCTGCGTAATTGGTGTCTGCTGTTTATCGATGGCCTCCTTCCA 
TTTCCTTTAGGCTTTGTTTATTGTTGTTTTTCCGGCTCCTTGAAGGAAAAGTTTCGATTATGGATGTTTGAACTTTCTTT 
TCTAAACAAGCATCTGAAGTTGCCGTTTTCCCTCTAAAGCAGGGATCCCGAGGCCCCTGGCTGTGGAGTGGCACCGGTCT 
GGGGCCTGTTAGGAACCCGGCGCACAGCGGGAGGCTAGGTGGGGTGTGGGGAGCCAGCGTTCCCGCCTGAGCCCCGCCCC 
TCTCAGATCAGCAGTGGCATGCGGTGCTCAGAGGCGCACACACCCTACTGAGAACTGTGCGTGAGAGGGGTCTAGATTCT 
GTGCTCCTTATGGGAATCTAATGCCTGATGATCTGAGGTGGAACCGTTTGCTCCCAAAACCATCCCCTTCCCCACTGCTG 
TCCTGTGGAAAAATCGTCTTCCACGAAACCAGTCCCTGGTACCACAATGGTTGGGGACCCTGTGCTAAAGACCTGCTTCA 
GCAGCCTCTCGTCAGTGTTGATATATTGGCTTTTCTGTGTTGAGTCCAGAATAATTACGGATTTCTGTGATGCTTTCCGC 
CGACCTCAGACCCATGGGCTATTTGTGGGCGTGTTGCCTGCTCCTGGGTTGGGAAGGGTGCAGGCCCCATGTACCTTCCT 
GTTACTGCCTTCCAGGTTGGTTCTCAGGGTTGAATCGTACTCGATGTGGTTTTAGCCCACGGCCCTGCCGCCAGCTCCTG 
GGGGCTGGGGAACATGCTGAAGCACAGAGTCACCGTGCGCGTCTTTTGATGCCTCACAAGCTCGAGGCCTCCTGTGTCCG 
TGTTAGTGTGTGTCACGTGCCTGCTCACATCCTGTCTTGGGGACGCAGGGGCTTAGCAGGTCCCGTAGTAAATGACAAGC 
GTCCTGGGGGAGTCTGCAGAATAGGAGGTGGGGGTGCCGGTCTCTCTCCCGCGTCTTCAGACTCTTCTCCTGCCTGTGCT 
GTGGCTGCACCTGCATCCCTGCAATCCCTCCAGCACTGGGCTGGAGAGGCCCGGGAGCTCGAGTGCCACTTGTGCCACGT 
GACTGTGGATGGCAGTCGGTCACGGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTTGGTCACAGGGGTCTGATGTGTG 
GTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGG 
ATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCGGTCGTG 
GGGTCTGATGTGGTGACTGTGGATGGCAGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATG 
TGGTGACTGTGGATGGCAGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACT 
GTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGG 
CGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGTGATCGGTCA 
CAGGGGTCTGATGTGTGGTGA.CTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGTGATCGGTCACAG 
GGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTTGGTCCCGGGGG 
TCTGATGTGTGGTGACTGTGGATGGCGATCGGTCACAGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCT 
GATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGGT 
GACTGTGGATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGAT 
GGCGGTTGGTCCCGGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGATGGCAG 
TCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGG 
TCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGT 
GGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGTGATCGGTCACAGGGGTCTGATGTGTGGT 
GACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGGTGACTGTGGAT 
GGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTAGGGTCTGATGTGTGGTGACTGTGGATGGCAGTCG 
GTCACAGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGG 
GGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGATGTGTGGTGACTGTGGATGGCGGTCGTGGGGTCTGAT 
GTGGTGACTGTGGATGGTGATCGGTCACAGGGGTCTGATGTGTGGTAGCTGCAGGTGGAGTCCCAGGTGTGTCTGTAGCT 
ACTTTGCGTCCTCGGCCCCCCGGCCCCCGTTTCCCAAACAGAAGCTTCCC.AGGCGCTCTCTGGGCTTCATCCCGCCATCG 
GGCTTGGCCGCAGGTCCACACGTCCTGATCGGAAGAA-ACAAGTGCCCAGCTCTGGCCGGGGCAGGCCACATTTGTGGCTC 
ATGCCCTCTCCTCTGCCGGCAG 



Intron 7 (SEQ ID NO 11) 

CrCTGGGCACTGCCCTGCAGGGTTGGGCACGGACTCCCAGCAGTGGGTCCTCCCCTGGGCAA.TCACTGGGCTCATGACCG 
GACAGACTCTTGGCCCTG^GGGGCAGTGGGGGG-AATGAGCTGTGATGGGGGCATGATGAGCTGTGTGCCTTGGCGAAATC 
TGAGCTGGGCCATGCCAG3CTGCGACAGCTGCTGCATTCAGGCACCTGCTCACGTTTGACTGCGCGGCCTCTCTCCAGTT 
CCGCAGTGCCTTTGTTCATGATTTGCTAJi_ATGTCTTCTCTGCCAGTTTTGATC7TGA3GCCAA.AGGAJ\AGGTGTCCCCCT 
CCTTTAGGAGGGCAGGCCATGTTTGAGCCGTGTCCTGCCCAGCTGGCCCCTCAGTGCTGGGTCTGAGGCCAA_AGGAAACG 
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GGCCCTGTGGCCCTTTGCAGATGTGGTCTGTCCACGTGGCCCTGTGGCTCTTTGCAGATGCCTGTTAGCACTTGCTCGGC 
TCTAGGGGACAGTCGTGTCCACCGCATGAGGCTCAGAGACCTCTGGGCGAATTTCCTTGGCTCCCAGGGTGGGGGTGGAG 
GTGGCCTGGGCTGCTGGGACCCAGACCCTGTGCCCGGCAGCTGGGCAGCAACTCCTGGATCACATATGCCATCCGGGCCA 
CGGTGGGCTGTGTGGGTGTGAGCCCAGCTGGACCCACAGGTGGCCCAGAGGAGACGTTCTGTGTCACACACTCTGCCTAA 
5 GCCCATGTGTGTCTGCAGAGACTCGGCCCGGCCAGCCCACGATGGCCCTGCATTCCAGCCCAGCCCCGCACTTCATCACA 
AACACTGACCCCAAAAGGGACGGAGGGTCTTGGCCACGTGGTCCTGCCTGTCTCAGCACCCACCGGCTCACTCCCATGTG 
TCTCCCGTCTGCTTTCGCAG 

In-tron 8 (SEQ ID NO 12) 

10 GTGAGTCAGGTGGCCAGGTGCCATTGCCCTGCGGGTGGCTGGGCGGGCTGGCAGGGCTTCTGCTCACCTCTCTCCTGCCC 
CTTCCCCACTGNCCTTCTGCCCGGGGCCACCAGAGTCTCCTTTTCTGGCCCCCGCCCCCTCCGGCTCCTGGGCTGCAGGC 
TCCCGAGGCCCCGGAAACATGGCTCGGCTTGCGGCAGCCGGAGCGGAGCAGGTGCCACACGAGGCCTGGAAATGGCAAGC 
GGGGTGTGGAGTTGCTCCTGCGTGGAGGACGAGGGGCGGGGGGTGTGTCTGGGTCAGGTGTGCGCCGAGCGTTTGAGCCT 
GCAGCTTGTCAGCTCCAAGTTACTACTGACGCTGGACACCCGGCTCTCACACGCTTGTATCTCTCTCTCCCGATACAAAA 

15 GGATTTTATCCGATTCTCATTCCTGTCCCTGTCGTGTGACCCCCGCGAGGGCGCGGGCTCTTCTCTCTGTGACTAGATTT 
CCCATCTGGAAAGTGCGGGGTTGACCGTGTAGTTTGCTCCTCTCGGGGGGCCTGTGGTGGCCATGGGGCAGGCGGCCTGG 
GAGAGCTGCCGTCACACAGCCACTGGGTGAGCCACACTCACGGTGGTAGAGCCACAGTGCCTGGTGCCACATCACGTCCT 
CTGGATTTTAAGTAAAACCACACACCTCCCGGCAGGCATCTGCCTGCGACCCTGTGTGTGCCTGGGGAGAGTGGTAGCAC 
GGAGGAAATTCGTGCACACTCAAGGTCATCAGCAAGGTCATCCGCAGTCAGGTGGAACGTGGAGGCCTCTCTCTGGGATC 

20 GTCTCCAGCGGATAAAGGACTGTGCACAGCTTCGGAAGCTTTTATTTAAAAATATAACTATTAATTATTGCATTATAAGT 
AATCACTAATGGTATCAGCAATTATAATATTTATTAAAGTATAATTAGAAATATTAAGTAGTACACACGTTCTGGAAAAA 
CACAAATTGCACATGGCAGCAGAGTGAATTTTGGCCGAGGGACACGTGTGCACATGTGTGTAAGCGGCCCCCAGGCCCAC 
AGAATTCGCTGACAAAGTCACCTCCCCAGAGAAGCCACCACGGGCCTCCTTCGTGGTCGTGAATTTTATTAAGATGGATC 
AAGTCACGTACCGTCCACGTGTGGCAGGGCTTTGGGGAATGTGAGGTGATGACTGCGTCCTCATGCCCTGACAGACAGGA 

25 GGTGACTGTGTCTGTCCTGTCCCTAGGACACGGACAGGCCCGAAGCTCTAGTCCCCATCGTGGTCCAGTTTGGCCTCTGA 
ATAAAAACGTCTTCAAAACCTGTTGCCCCAAAAACTAAGAACAGAGAGAGTTTCCCATCCCATGTGCTCACAGGGGCGTA 
TCTGCTTGCGTTGACTCGCTGGGCTGGCCGGACTCCTAGAGTTGGTGCGTGTGCTTCTGTGCAAAAAGTGCAGTCCTCTT 
GCCCATCACTGTGATATCTGCACCAGCAAGGAAAGCCTCTTTTCTTTTCTTTCTTTTTTTTTTTTTGAGACGGAACGTCA 
CTGTTGTCTGCCTGGGCTTGAGTGCAGTGGCGCGATCTCAACTCACTGCAACCTCCGCCTCCCGGGTTCCAGCATTTCTC 

30 CTGCCTCAGCCTCCCGAGCAGCTGAGATTACAGGCACCCACCCCCTGCGCCTGGCTAATTTTTGTATTTTTAGTAGAGAG 
GGGTTTTTGCCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAGGTGATCCACCCACCTCGGCCTCCCAAAGTGCTG 
GGATTACAGGTGTGAGCCATCACGCCCAGCCGGAAAGCCTCTTTTTAAGGTGACCACCTATAGCGCTTCCCGAAAATAAC 
AGGTCTTGTTTTTGCAGTAGGCTGCAAGCGTCTCTTAGCAACAGGAGTGGCGTCCTGTGGGCTCTGGGGATGGCTGAGGG 
7CGCGTGGCAGCCATGCCTTCTGTGTGCACCTTTAGGTTCCACGGGGCTATTCTGCTCTCACTGTTTGTCTGAAAACGCA 

35 CCCTTGGCATCCTTGTTTGGA3AGTTTCTGCTTCTCGTTGGTCATGCTGAAACTAGGGGCA.AGGTTGTATCCGTTGGCGC 
G3AGCGGCTACATGTAGGGTCATGAGTCTTTCACCGTGGACAAATTCCTTGAAA,AAAAJLAAAAGGAGTCCGGTTAAGCAT 
T:ATTCCGGGTCAAGTGTCTGGTTCTGTG,fU\T;iAJiCTCTAAGATTTAAGAAACCTTAATGA-AAGAAAACCTTGATGATTC 
P3AGC.AAGGATGTGGTCACACCTGTGGCTGGATCTGTTTCAGCCGCCCCAGTGCATGGTGAGAGTGGGGAGCAGGGATTG 
:TTGTTCAGAGGTCTCATCTGGTATGTTTCTGA3GTGTTTGCCGGCTGAATGGTAGACGTGTCGTTTGTGTGTATGAGGT 

40 : CTGTGTCTGTGTGTGGCTCG3TTTGAGTGTACGCATGTCCAGCACATGCCCTGCCCGTCTCTCACCTGTGTCTTCCCGC 



Intron 9 (SEQ ID NO 13) 

45 : :tg3agaccatgactgctct :tcttgagg.az\c::a 
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TTGCACAGCCTGAGGACTGCGGGCTCCACGCAGGCTCTGTCCAGCGGCCATGTCCAGAGGCCTCAGGGCTCAGCAGGCGG 
GAGGGCCGCTGCCCTGCATGATGAGCATGTGAATTCAACACCGAGGAAGCACACCAGCTTCTGTCACGTCACCCAGGTTC 
CGTTAGGGTCCTTGGGGAGATGGGGCTGGTGCAGCCTGAGGCCCCACATCTCCCAGCAGGCCCTCGACAGGTGGCCTGGA 
CTGGGCGCCTCTTCAGCCCATTGCCCATCCCACTTGCATGGGGTCTACACCCAAGGACGCACACACCTAAATATCGTGCC 

ATGTGCACGACGTGCAGGTTAGTTACATATGTATACATGTGCCATGTTGGTGTGCTGCACCCATTAACTCATCATTTACA 
TTAGGTATATCTCCTAATGCTATCCCTCCCCACTCCCCCCATCCCATGACAGGCCCTGGTGTGTGATGTTCCCCACCCTG 
TGTCCAAGTGTTCTCATTGTTCAGTTCCCACCTGTGAGTGAGAACATGTGGTGTTTGGTTTTCTTTCCTTGCAATAGTTT 
GCTCAGAGTGATGGTTTCCAGCTTCGTCCATGTCCCTACAAAGGACATGAACTCATCCTTTTTTATGACTGCATAGTATT 

10 CCGTGGTGTATATGTGCCACATTTTCTTAATCCAGTCTATCATCGATGGACATTTGGGTTGGTTGCAAGTCTTTGCTACT 
GTGAATAGTGCCGCAATAAACATACGTGTGCATGTGTCTTTATAGCAGCATGATTTATAATCCTTTGGGTATATACCCAG 
TAATGGGATGGCTGGGTCAAATGGTATTTCTAGTTCTAGATCCTTGAGGAATCACCACACTGTCTTCCACAATGGTTGAA 
CTAGTTTACACTCCCACCAACAGTGTAAAAGTGTTCTGGTGCTGGAGAGGATGTGGACAGCAGTTATTTTTTTATGAAAA 
TAGTATCACTGAACAAGCAGACAGTTAGTGAAGGATGCGTCAGGAAGCCTGCAGGCCACACAGCCATTTCTCTCGAAGAC 

15 TCCGGGTTTTTCCTGTGCATCTTTTGAAACTCTAGCTCCAATTATAGCATGTACAGTGGATCAAGGTTCTTCTTCATTAA 
GGTTCAAGTTCTAGATTGAAATAAGTTTATGTAACAGAAACAAAAATTTCTTGTACACACAACTTGCTCTGGGATTTGGA 
GGAAAGTGTCCTCGAGCTGGCGGCACACTGGTCAGCCCTCTGGGACAGGATACCTCTGGCCCATGGTCATGGGGCGCTGG 
GCTTGGGCCTGAGGGTCACACAGTGCACCATGCCCAGCTTCCTGTGGATAGGATCTGGGTCTCGGATCATGCTGAGGACC 
ACAGCTGCCATGCTGGTAAAGGGCACCACGTGGCTCAGAGGGGGCGAGGTTCCCAGCCCCAGCTTTCTTACCGTCTTCAG 

20 TTATTTTTCCCTAAGAGTCTGAGAAGTGGGGCCGCGCCTGATGGCCTTCGTTCGTCTTCAGCTGGCACAGAATTGCACAA 
&CTGATGGTA-AACACTGAGTACTTATAATGAATGAGGAATTGCTGTAGCAGTTAACTGTAGAGAGCTCGTCTGTTGGAAA 
GAAATTTAAGTTTTTCATTTAACCGCTTTGGAGAATGTTACTTTATTTATGGCTGTGTAAATTGTTTGACATTCAGTCCC 
TCGTAGACAGATACTACGTAAAAAGTGTAAAGTTAACCTTGCTGTGTATTTTCCCTTATTTTAG 

25 Intron 10 (SEQ ID NO 14) 

GTGAGGCCCGTGCCGTGTGTCTGTGGGGACCTCCACAGCCTGTGGGCTTTGCAGTTGAGCCCCCCGTGTCCTGCCCCTGG 
CACCGCAGCGTTGTCTCTGCCAAGTCCTCTCTCTCTGCCGGTGCTGGATCCGCAAGAGCAGAGGCGCTTGGCCGTGCACC 
CAGGCCTGGGGGCGCAGGGGCACCTTCGGGAGGGAGTGGGTACCGTGCAGGCCCTGGTCCTGCAGAGACGCACCCAGGTT 
ACACACGTGGTGAGTGCAGGCGGTGACCTGGCTCCTGCTGCTCTTTGGAAAGTCAAGAGTGGCGGCTCCTGGGGCCCCAG 

30 TGAGACCCCCAGGAGCTGTGCACAGGGCCTGCAGGGCCGAGGCGGCAGCCTCCTCCCCAGGGTGCACCTGAGCCTGCGGA 
GAGCAGGAGCTGCTGAGTGAGCTGGCCCACAGCGTTCGCTGCGGTCACGTTCCTGCGTGGGGTTGTTTGGGATCGGTGGG 
AGAATTTGGATTTGCTGAGTGCTGCTGTCTTGAACCACGGAGATGGCTAGGAGTGGGTTTCAGAGTTGATTTTTGTGAAT 
CAAi^.CTAA/iATCAGGCACAGGGGACCTGGCCTCAGCACAGGGGATTGTCCAATGTGGTCCCCCTCAAGGGCGCCCCACAG 
AGCCGGTGGGCTTGTTTTAAAGTGCGATTTGACGAGGGACGAGAAACCTTGAAAGCTGTAAAGGGAACCCTCAGAAAATG 

35 TGGCCGCCAGGGGTGGTTTCAGGTGCTTTGCTGGGCTGTGTTTGTGAAAACCCATTTGGACCCGCCCTCCAA3TCCACCC 
TCCAGGTCCACCCTCCAGGGCCGCCCTGGGCTGGGGGTATGCCTGGCGTTCCTTGTGCCGCAGCCCGGAGCA"AGCAGGC 
TGTGCACATTTAAATCCACTAAGATTCACTCGGGGGGAGCCCAGGTCCCAAGCAACTGhGGGCTCAGGAGTC^TGAGGCT 
GCTGAGGGGACAGAGCAGACGGGGAACGCTGCTTCTGTGTGGCAAGTTCCTGAGGGTGCTGGCCAGGGAGGTGGCTCAGA 
GTGTATGTTGGGGTCCCACCGGGGGCAGAACTCTGTCTCTGATGAGTCGGCAGCCATGTAACAGGAAGGGGT'^GCCACAG 

40 GGAGCTGGGAATGCACCAGGGGAGCTGCGCAGCTGGCCGAGGTCCCAGGGCCAGGCCACAGGAAGGGCAGGG3GACGCCC 
GGGGCCACAGCAGAGGCCGCAGGAAGGGAa.GGGGATGCCCAGGCCAGAGCAGAGGCTA:CGGGCACAGGGGG:;CTCCCTG 
AGCTGGGTGAGCGAGGCTCATGACTCGGCGAGGGAACCTCCTTGACGTG-AACCTGACGACTGGTGTTGCCC? xtcacag 
CCCAGCCAGGTCCCGCGCCTGAGCAGGAACTCAGAACCCTCCCCTTTGTCTA^^AGCACAGCAGATGCCTTC.^- ^GGCATCT 
AGGAGAAAACAGGCAAAGTCGTTGAG.^\AACGTCTTAAAJ\GAAGGTGGGATGGTGGCA.HTTTCrTGTCCAGAT'-TTAGTCT 

45 GCCCCGGACCACAGATGAGTCTATAACGGGATTGTGGTGTTGCCATGGGGACACATGA3ATGGACCATCAC.^ 2i.GGCCAC 
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GGGAGACAGGGAAAGCACCCCGAAGTCTGGAGCAGGGCTGGGTCCAGGCTCCTCAGAGCTCCTGCCAGGCCCAGCACCCT 
GCTCCAAATCACCACTTCTCTGGGGTTTTCCAAAGCATTTAACAAGGGTGTCAGGTTACCTCCTGGGTGACGGCCCCGCA 
TCCTGGGGCTGACATTGCCCCTCTGCCTTAG 

Intron 11 (SEQ ID NO 15) 

GTGAGCGCACCTGGCCGGAAGTGGAGCCTGTGCCCGGCTGGGGCAGGTGCTGCTGCAGGGCCGTTGCGTCCACCTCTGCT 
TCCGTGTGGGGCAGGCGACTGCCAATCCCAAAGGGTCAGAGGCCACAGGGTGCCCCTCGTCCCATCTGGGGCTGAGCAGA 
AATGCATCTTTCTGTGGGAGTGAGGGTGCTCACAACGGGAGCAGTTTTCTGTGCTATTTTGGTAAAAGGAAATGGTGCAC 
CAGACCTGGGTGCACTGAGGTGTCTTCAGAAAGCAGTCTGGATCCGAACCCAAGACGCCCGGGCCCTGCTGGGCGTGAGT 
CTCTCAAACCCGAACACAGGGGCCCTGCTGGGCATGAGTCCCTCTGAACCCGAGACCCTGGGGCCCTGCTGGGCGTGAGT 
CTCTCCGAACCCAGAGACTTCAGGGCCCTTTTGGGCGTGAGTCTCTCCGCTGTGAGCCCCACACTCCAAGGCTCATCCAC 
AGTCTACAGGATGCCATGAGTTCATGATCACGTGTGACCCATCAGGGGACAGGGCCATGGTGTGGGGGGGGTCTCTACAA 
AATTCTGGGGTCTTGTTTCCCCAGAGCCCGAGAGCTCAAGGCCCCGTCTCAGGCTCAGACACAAATGAATTGAAGATGGA 
CACAGATGCAGAAATCTGTGCTGTTTCTTTTATGAATATWXAGTATCAACATTCCAGGCAGGGCAAGGTGGCTCACACCT 
ATAATCCCAGCACTTTGGGAGGCCGAGGTGGGTGGATCACTTGAGGCCAGGAGTTTGAGGCCAACCTAACCAACATAGTG 
AAATTCCATTTCTACTTAAAAAATACAAAAATTAGCCTGGCCTGGTGGCACACGCCTGTAGTCCCCGCTATGCGGGAGGC 
TGAGGCAGGAGAATCATTTGAACCCAGGAGGCAGAGGTTGCAGTGAGCCGAGATCACACCACTGCACTCCAGCCTGGGCA 
ACAGAGTGAGACTTCATCTTAAAAAAAAAAAAAAAAGTATCAGCATTCCAAAACCATAGTGGACAGGTGTTTTTTTATTC 
TGTCCTTCGATAATATTTACTGGTGCTGTGCTAGAGGCCGGAACTGGGGGTGCCTTCCTCTG7W\GGCACACCTTCATGG 
GAAGAGAAZiTAAGTGGTGAATGGTTGTTAAACCAGAGGTTTAAACTGGGGTCCTGTCGTTCTGAGTTAACAGTCCAGATC 
TGGACTTTGCCTCTTTCCAGAATGCTCCCTGGGGTTTGCTTCATGGGGGAGCAGCAGGTGTGGACACCCTCGTGATGGGG 
GAGCAGCAGGTGCAGACGCCCTCATGATGGGGGAGTGGCAGGTGCAGACACCCTTGTGCATGGTGCCCAGCATGTCCCTG 
TTGCAGCTCCCTCCCCACAAGGATGCCGGTCTCCTGTGCTCCCCACAGTCCCTGCTTCCCTCTCACAGCCTTACCTGGTC 
CTGGCCTCCACTGGCTTTGTCTGCATGATTTCCACATTTCCTGGGCTCCCAGCACCTCTTCGCCTCTCCCAGGCACCTCT 
GCAGTGCTGGCCATACCAGTCAGCTGTGAACTGTCCACTGCTTATTTTGCTCCCCATGAAATGTATTTTTTAGGACAGGC 
ACCCCTGGTTCCAGCCTCTGGCACAGCATCAGTGAATGTTATTGAAGGACAAAGGACAGACAAACAAATCAGGAAAATGG 
GTTCTCTCTAAACACATTGCAAAGCCACAGAGGCTAGTGCAGGATGGGTGGGCATCAGGTCATCAGATGTGGGTCCAATG 
CCAGAATATTCTGTGCTCCCAAAGGCCACTTGGTCAGAGTGTGTGCTTGCAGAGGTGGCTCTAAAAGCTCAGCAGTGGAG 
GCAGTGGTTCGCCATACTCAGGGTG^ACTCACATCCTCTGTGTCTGAAGTATACAGCAGAGGCTTGAAGGGCATCTGGGA 
GAAGAAAACAGGC/iAAATGATTAAGAAAAGTGAAAAAGGAAAAGTGGTAAGATGGGAATTTTCTTGTCCAGATTTTAGTC 
TCCCAAACCACAGCTCAGATGGTAGAATGTGGTCAGAACTGATGGACAGAACAATAGAACAAAACGGAAGCCCTATCTCT 
CAGAAACGTGTGTTMTGTGGTATGTGGCACAGCTGATGGAAAAGAGAGTGTGTGTGTAATTTTTTTTTCTGAGAAAACT 
GACTGGAAGCAAATAAGTTGTGTCTTTACAGCATATACCAGAGCAGATTCTAGGTAGAAGAGGAGACACATGCAAACAAC 
ACCAGCAACAGAAATAAAACAAAAGACTCAMGGGAAGGGAGGTGAACGTTCCCTGGTTTGGTGTTGGGGAAGGACACAC 
AGGGAGGCGGATGAAACCAGTGAGGCAACGGGCATTGCTTTCACTGCAGAGAAACTCAGCTTGCCTGAGCCACAGTGAAA 
ATGGCCATTCCCTGGAGCGTTTGTGCACGTGATTTATTTAAGGCGCCCTGTGAGGTCCTGCACATTCATCCTCTCACTTT 
GTTCTCCTAACCACCTGAGAGGTAGAGGAGGAAAGGCTCCAGGGGAGCAGCCGCCCTTGGTCACCCAGCTGGCAAAGGGC 
ATGCATGATTGCAGCCTGGCCTCCTGCTCCGGGGCCCTTGCTCTGCCCGAGGACCCCACACAZ\GTCAGACCCATAGGCTC 
AGGGTGAGCCGGAGCCCAAGGTCGTGTTGGGGATGGCTGTGAAAGAAGAAATGGACGTCTGATGCACACTTGGGAAGGTC 
CTACCAGCAGCGTCA^^AGAAATGCATGTGAAACTGACAGCGAGACCCATCCCTCAAAGAAACGCACGTGAAACTGATGGC 
GAGACC7GTCCCCATCCCTCATGCTGGCTCCTTTTCTGGGCTTGCCAAGAGCCAGCATCAGGTTGAGGCAAGCTGGAAAG 

ACTT7T :tggaaagcagcttgtttgcatggaz\gtcctcacaatgtcctgtgtcttcccagt.aattccacttctgaagtga 
ccagac.-ttatcacgggtcttatttaccatttccagtgttccaggcagggggacttgccacagcaagtcacgaacctgcc 
caaatacagggctaacgagatattatgcatcacaaaacttgctctgccattaaacatttttcaaagaatttttgaagaat 
gtttaa"ggcacaaa.^cgtttatttcaatgtagcagtgttcaaagctggatgtaaaagaacacaccccaggagcctgccg 

TGAATGTCATGTGTGT7CATCTTTGGACATGGACATACATGGGCAGTGAGTGGTG3TGAGGCCCTGGAGGACATCGGTGG 
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GATGCCTCCATCCTGCCCCTCTGGAGACACCATGTGTGCCACGTGCACTCACTGGAGCCCTGTTTAGCTGGTGCCACCTG 
GCTCTTCCATCCCTGAGATTCAAACACAGTGAGATTCCCCACGCCCAACTCAGTGTTCTCCCACAAAAAACCTGAGTCAC 
ACCTGTGTTCACTCGAGGGACGCCCGGGAGCCAGGGCTCCACAGTTTATTATGTGTTTTTGGCTGAGTTATGTGCAGATC 
TCATCAGGGCAGATGATGAGTGCACAAACACGGCCGTGCGAGGTTTGGATACACTCAACATCACTAGCCAGGTCCTGGTG 
5 GAGTTTGGTCATGCAGAGTCTGGATGGCATGTAGCATTTGGAGTCCATGGAGTGAGCACCCAGCCCCCTCGGGCTGCAGC 
GCATGCCCCAGGCAGGACAAGGAAGCGGGAGGAAGGCAGGAGGCTCTTTGGAGCAAGCTTTGCAGGAGGGGGCTGGGTGT 
GGGGCAGGCACCTGTGTCTGACATTCCCCCCTGTGTCTCAG 

Intron 12 (SEQ ID NO 16) 

10 GTGAGCAGGCTGATGGTCAGCACAGAGTTCAGAGTTCAGGAGGTGTGTGCGCSAGTATGTGTGTGTGTGTGTGCGCGCGT 
GCCTGCAAGGCTGATGGTGACTGGCTGCACGTAAGAGTGCACATGTACGCATATACACGTGAGCACATACATGTGTGCAT 
GTGTGTACATGAAGGCATGGCAGTGTGTGCACAGGTGTGCAAGGGCACAAGTGTGTGCACATGCGAATGCACACCTGACA 
TGCATGTGTGTTCGTGCACAGTCGTGTGGGCATTCACGTGAGGTGCATGCGTGTGGGTGTGCAGTGTGAGTAGCATGTGT 
GCACATAACATGTATTGAGGGGTCCTCGTGTTCACCCCGCTAGGTCCTCAGCACCAGTGCCACTCCTTACAGGATGAGAC 

15 GGGGTCCCAGGCCTTGGTGGGCTGAGGCTCTGAAGCTGCAGCCCTGAGGGCATTGTCCCATCTGGGCATCCGCGTCCACT 
CCCTCTCCTGTGGGCTTCTGTGTCCACTCCCCCTCTCCTGTGGGCATTTACATCCACTCCACTCCCTCTCTCCTGTGGGC 
ATCCGCGTCCACTCCCCCTCTCTGTGGGCATCTGCGTCCACCTCCCCTCTCTGTGGGCATTTGCGTCCACTCCCTCTCCT 
GGTTCCTTCCTGTCTTGGCCGAGCCTCGGGGGCAGGCAGATGACACAGAGTCTTGACTCGCCCAGGGTGGTTCGCAGCTG 
CCGGGTGAGGGCCAGGCCGGATTTCACTGGGAAGAGGGATAGTTTCTTGTCAAAATGTTCCTCTTTCTTGTTCCATCTGA 

20 ATGGATGATAAAGCAAAAAGTAAAAACTTAAAATCCCAGAGAGGTTTCTACCGTTTCTCACTCTTTCTTGGCGACTCTAG 

Intron 13 (SEQ ID NO 17) 

GTGAGCCGCCACCAAGGGGT3CAGGCCCAGCCTCCAGGGACCCTCCGCGCTCTGCTCAC.CTCTGACCCGGGGCTTCACCT 
TGGAACTCCTGGGTTTTAGGGGCAAGGAATGTCTTACGTTTTCAGTGGTGCTGCTGCCTGTGCACAGTTCTGTTCGCGTG 

25 GCTCTGTGCAAAGCACCTGTTCTCCATCTCTGGGTAGTGGTAGGAGCCGGTGTGGCCCCAGGTGTCCCCACTGTGCCTGT 
GCACTGGCCGTGGGACGTCATGGAGGCCATCCCAGGGCAGCAGGGGCATGGGGTAAAGAGATGTTTATGGGGAGTCTTAG 
CAGAGGAGGCTGGGAAGGTGTCTGAACAGTAGATGGGAGATCAGATGCCCGGAGGATTTGGGGTCTCAGCAAAGAGGGCC 
GAGGTGGGTGCAGGTGAGGGTCGCTGGCCCCACCCCCGGGAAGGTGCAGCAGAGCTGTGGCTCCCCACACAGCCCGGCCA 
GCACCTGTGCTCTGGGCATGGCTGTGCTCCTGGAACGTTCCCTGTCCTGGCTGGTCAGGGGGTGCCCCTGCCAAGAATCG 

30 ACAACTTTATCACAGAGGGAAGGGCCAATCTGTGGAGGCCACAGGGCCAGCTTCTGCCTGGAGTCAGGGCAGGTGGTGGC 
ACAAGCCTCGGGGCTGTACCAAAGGGCAGTCGGGCACCACAGGCCCGGGCCTCCACCTCAACAGGCCTCCCGAGCCACTG 
GGAGCTGAATGCCAGGAGGCCGAAGCCCTCGCCCCATGAGGGCTGAGAAGGAGTGTGAGCATTTGTGTTACCCAGGGCCG 
AGGCTGCGCGAATTACCGTGCACACTTGATGTGAAATGAGGTCGTCGTCTATCGTGGAAACCCAGCAAGGGCTCACGGGA 
GAGTTTTCCATTACAAGGTCGTACCATGAAAATGGTTTTTAACCCGAGTGCTTGCGCCTTCATGCTCTGGCAGGGAGGGC 

35 AGAGCCACAGCTGCATGTTACCGCCTTTGCACCAGCTCCAGAGGCTTGGGACCAGGCTGTCTCAGTTCCAGGGTGCGTCC 
GGCTCAGACCGCCCTCCTCTCTGCCTTCTCTCTCTGCCTCAAATCTTCCCTCGTTTGCATCTCCCTGACGCGTGCCTGGG 
CCCTCGTGCA.AGCTGCTTGACTCCTTTCCGGAAACCCTTGGGGTGTGCTGGATACAGGTGCCACTGAGGACTGGAGGTGT 
CTGACACTGT3GTTGACCCCAGGGTCCAGCTGGCGTGCTTGGGGCCTCCTTGGGCCATGATGAGGTCAGAGGAGTTTTCC 

caggtgaaaa.-:tcctgggaaactcccagggccatgtgacctgccacctgctcctcccatattcagctcagtcttgtcctc 

40 aTTTCCCCAC:AGG3TCTCTAGCTCCGAGGAGCTCCCGTAGAGGGCCTGGGCTCAGG3CAGGGCGGCTGAGTTTCCCCAC 
CCATGTGGGG-\CCCTTGGGTAGTCGCTTGATTGGGTAGCCCTGAG3AGGCCGAGATGCGATGGGCCACGGGCCGTTTCCA 
aA.CACAGAGT ^AGGCACGTGGAAGGCCCAGGAATCCCCTTCCCT 33AGGCAGGAGTGGGAGAACGG^\GAGCTGGGCCCCG 
ATTTCACGGC;-.GCCAGGCTGCAG73GG3GAGGCTGTGGTGGTCCACGTGGCGCTGGGGGCGGGGTCTGATTCAAATCCGC 
TGGGGCTCGG ;CTT3CTGGCCCG7 3 3T3GCCGCGCCTCCACACGG3CTTGGGGTGGA3G33CCGACCTCTAGCAGGTGGC 

45 TATTTCTCCC-:TTGGAA3AGAGCCCCT3ACCCATGCTAGGTGTTTCCCTCCTGGGTCAGG'\GCGTGGCCGTGTGGCAACC 
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CCGGGACCTTAGGCTTATTTATTTGTTTAAAAACATTCTGGGCCTGGCTTCCGTTGTTGCTAAATGGGGAAAAGACATCC 
CACCTCAGCAGAGTTACTGAGAGGCTGAAACCGGGGTGCTGGCTTGACTGGTGTGATCTCAGGTCATTCCAGAAGTGGCT 
CAGGAAGTCAGTGAGACCAGGTACATGGGGGGCTCAGGCAGTGGGTGAGATGAGGTACACGGGGGGCTCAGGCAGTGGGT 
GAGGCCAGGTACATGGGGGGCTCAGGCACTGGGTGAGATGAGGTACACGGGGGGCTCAGGCAGAGGGTCAGACCAGGTAC 
5 ACGGGGGCTCTGATCACACGCACATATGAGCACATGTGCACATGTGCTGTTTCATGGTAGCCAGGTCTGTGCACACCTGC 
CCCAAAGTCCCAGGAAGCTGAGAGGCCAAAGATGGAGGCTGACAGGGCTGGCGCGGTGGCTCACACCTGTAGTCCCAGCA 
CTTTGGGAGGCCGAGGCGAGAGGATCCCTTGAGCCCAGGAGTTTAAGACCAGCCTGAGCAACATAGTAGAACCCCATCTC 
TATGAAAAATAAAAACAAAAATTAGCTGAACATGGTGGTGTGCGCCTGTAGTTCCAATACTTGGGAGGCTGAAGTGGGAG 
GATCACTTGAGCCCAGGAGGTGGAAGCTGCAGTGAGCTGAGATTGCACCACTGTACTGCAGCCTGGGTGACAGAGTGAGA 

10 GCCCATCTCAACAACAACAAAGAAGACTGACAAATGCAGTTTCTTGGAAAGAAACATTTAGTAGGAACTTAACCTACACA 
CAGAAGCCAAGTCGGTGTCTCGGTGTCAGTGAGATGAGATGATGGGTCCTCACACCATCACCCCAGACCCAGGGTTTATG 
CACCACAGGGGCGGGTGGCTCAG7\AGGGATGCGCAGGACGTTGATATACGATGACATCAAGGTTGTCTGACGAAGGGCAG 
GATTCATGATAAGTACCTGCTGGTACACAAGGAACAATGGATAAACTGGAAACCTTAGAGGCCTTCCCGGAACAGGGGCT 
AATCAGAAGCCAGCATGGGGGGCTGGCATCCAGGATGGAGCTGCTTCAGCCTCCACATGCGTGTTCATACAGATGGTGCA 

15 CAGAAACGCAGTGTACCTGTGCACACACAGACACGCAGCTACTCGCACACACAAGCACACACACAGACATGCATGCATGC 
ATCCGTGTGTGTGCACCTGTGCCCATGAGGAAACCCATGCATGTGCATTCATGCACGCACACAGGCACCGGTGGGCCCAT 
GCCCACACCCACGAGCACCGTCTGATTAGGAGGCCTTTCCTCTGACGCTGTCCGCCATCCTCTCAG 

Intron 14 (WEQ ID NO 18) 

20 GTATGTGCAGGTGCCTGGCCTCAGTGGCAGCAGTGCCTGCCTGCTGGTGTTAGTGTGTCAGGAGACTGAGTGAATCTGGG 
CTTAGGAAGTTCTTACCCCTTTTCGCATCAGGAAGTGGTTTAACCCAACCACTGTCAGGCTCGTCTGCCCGCCCTCTCGT 
GGGGTGAGCAGAGCACCTGATGGAAGGGACAGGAGCTGTCTGGGAGCTGCCATCCTTCCCACCTTGCTCTGCCTGGGGAA 
GCGCTGGGGGGCCTGGTCTCTCCTGTTTGCCCCATGGTGGGATTTGGGGGGCCTGGCCTCTCCTGTTTGCCCTGTGGTGG 
GATTGGGCTGTCrCCCGTCCATGGCACTTAGGGCCCTTGTGCAAACCCAGGCCAAGGGCTTAGGAGGAGGCCAGGCCCAG 

25 GCTACCCCACCCCTCTCAGGAGCAGAGGCCGCGTATCACCACGACAGAGCCCCGCGCCGTCCTCTGCTTCCCAGTCACCG 
TCCTCTGCCCCTGGACACTTTGTCCAGCATCAGGGAGGTTTCTGATCCGTCTGAAATTCAAGCCATGTCGAACCTGCGGT 
CCTGAGCTTAACAGCTTCTACTTTCTGTTCTTTCTGTGTTGTGGAAATTTCACCTGGAGAAGCCGAAGAAAACATTTCTG 
TCGTGACTCCTGCGGTGCTTGGGTCGGGACAGCCAGAGATGGAGCCACCCCGCAGACCGTCGGGTGTGGGCAGCTTTCCG 
GTGTCTCCTGGGAGGGGAGCTGGGCTGGGCCTGTGACTCCTCAGCCTCTGTTTTCCCCCAG 

30 

Intron 15 (SEQ ID NO 19) 

GCAAGTGTGGGTGGAGGCCAGTGCGGGCCCCACCTGCCCAGGGGTCATCCTTGAACGCCCTGTGTGGGGCGAGCAGCCTC 
AGATGCTGCTGAAGTGCAGACGCCCCCGGGCCTGACCCTGGGGGCCTGGAGCCACGCTGGCAGCCCTATGTGATTAAACG 
CTGGTGTCCCCAGGCCACGGAGCCTGGCAGGGTCCCCAACTTCTTGAACCCCTGCTTCCCATCTCAGGGGCGATGGCTCC 
35 CCACGCTTGGGAGCCTTCTGACCCCTGACCTGTGTCCTCTCACAGCCTCTTCCCTGGCTGCTGCCCTGAGCTCCTGGGGT 
CCTGAGCAAGTTCTCTCCCCGCCCCGCCGCTCCAGCGTCACTGGGCTGCCTGTCTGCTCGCCCCGGTGGAGGGGTGTCTG 
TCCCTTCACTGAGGTTCCCACCAGCCAGGGCCACGAGGTGCAGGCCCTGCCTGCCCGGCCACCCACACGTCCTAGGAGGG 
TTGGAGGATGCCACCTCTGGCCTCTTCTGGAACGGAGTCTGATTTTGGCCCCGCAG 

40 3 '-untranscribed region {SEQ ID NO 20) 

ATCTCATGTTTGAATCCTAATGTGCACTGCATAGACACCACTGTATGCAATTACAGAAGCCT3TGAGTGAACGGGGTGGT 
GGTCAGTGCGGGCCCATGGCCTGGCTGTGCATTTACGGAAGTCTATGAGTG.-J\TGGGGTTGT'JGTCAGTGCGGGCCCATG 
GCCTGGCTGGGCCTGGGAGGTTTCTGATGCTGTGAGGCAGGAGGGGAAGGAGGGTAGGGGATAGACAGTGGGAGCCCCCA 
CCCTGGAAGACATAACAGTAAGTCCAGGCCCGAAGGGCAGCAGGGATGCTGGGGGCCCAGCTTGGGCGGCGGGGATGATG 
45 GAGGGCCTGGCCAGGGTGGCAGGGATGATGGGGGCCCCAGCTGGGGTGGCAGGGGTGATGGGGGGGGCTGGTCTGGGTGG 
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CGGGGAAGATGGGGAAGCCTGGCTGGGCCCCCTCCTCCCCTGCCTCCCACCTGCAGCCGTGGATCCGGATGTGCTTCCCT 
GGTGCACATCCTCTGGGCCATCAGCTTTCATGGAGGTGGGGGGCAGGGGCATGACACCATCCTGTATAAAATCCAGGATT 
CCTCCTCCTGAACGCCCCAACTCAGGTTGAAAGTCACATTCCGCCTCTGGCCATTCTCTTAAGAGTAGACCAGGATTCTG 
ATCTCTGAAGGGTGGGTAGGGTGGGGCAGTGGAGGGTGTGGACACAGGAGGCTTCAGGGTGGGGCTGGTGATGCTCTCTC 
5 ATCCTCTTATCATCTCCCAGTCTCATCTCTCATCCTCTTATCATCTCCCAGTCTCATCTGTCTTCCTCTTATCTCCCAGT 
CTCATCTGTCATCCTCTTACCATCTCCCAGTCTCATCTCTTATCCTCTTATCTCCTAGTCTCATCCAGACTTACCTCCCA 
GGGCGGGTGCCAGGCTCGCAGTGGAGCTGGACATACGTCCTTCCTCAGGCAGAAGGAACTGGAAGGATTGCAGAGAACAG 
GAGGGGCGGCTCAGAGGGACGCAGTCTTGGGGTGAAGAAACAGCCCCTCCTCAGAAGTTGGCTTGGGCCACACGAAACCG 
AGGGCCCTGCGTGAGTGGCTCCAGAGCCTTCCAGCAGGTCCCTGGTGGGGCCTTATGGTATGGCCGGGTCCTACTGAGTG 

10 CACCTTGGACAGGGCTTCTGGTTTGAGTGCAGCCCGGACGTGCCTGGTGTCGGGGTGGGGGCTTATGGCCACTGGATATG 
GCGTCATTTATTGCTGCTGCTTCAGAGAATGTCTGAGTGACCGAGCCTAATGTGTATGGTGGGCCCAAGTCCACAGACTG 
TGTCGTAAATGCACTCTGGTGCCTGGAGCCCCCGTATAGGAGCTGTGAGGAAGGAGGGGCTCTTGGCAGCCGGCCTGGGG 
GCGCCTTTGCCCTGCAAACTGGAAGGGAGCGGCCCCGGGCGCCGTGGGCGGACGACCTCAAGTGAGAGGTTGGACAGAAC 
AGGGCGGGGACTTCCCAGGAGCAGAGGCCGCTGCTCAGGCACACCTGGGTTTGAATCACAGACCAACaGGTCAGGCCATT 

15 GTTCAGCTATCCATCTTCTACAAAGCTCCAGATTCCTGTTTCTCCGGGTGTTTTTTGTTGAAATTTTACTCAGGATTACT 
TATATTTTTTGCTAAAGTATTAGACCCTTAAAAAAGGTATTTGCTTTGATATGGCTTAACTCACTAAGCACCTACTTTAT 
TTGTCTGTTTTTATTTATTATTATTATTATTATTAGAGATGGTGTCTACTCTGTCACCCAGGTTGTTAGTGCAGTGGCAC 
AGTCATGGCTCGCTGTAGCCGCAAACCCCCAGGCTCAAGTGATCCTCCGGCCTCAGCTTCCCAGAGTGCTGGGATTACAG 
GTGTGAGCCACTGCCCTTGCCTGGCACTTTTAAAAACCACTATGTAAGGTCAGGTCCAGTGGCTTCCACACCTGTCATCC 

20 CAGTAGTTTGGGAAGCCGAGGCAGAAGGATTGTCTGAGGCCAGGAGTTTGAGACCAGCATGGGTAACATAGGGAGACCCC 
ATCTCTACAAAAAATGCAAAAAGTTATCCGGGCGTGGGGTCCAGCATCTGTAGTCCCAGCTGCTCGGGAGGCTGAGTGGG 
AGGATCGCTTGAGCCCGGGAGGTCATGGCTGCAGTGAGCTGTGATTGrACCATCGCACTCCAGCCTGGGCAACAGAGTGA 
GACCCTGTCT CAAAAAAAAAAAAAAAAAAAGAAGGAGAAGGAG AAGAGAAGAAGAAGGAAGAAGGAAAGAGAAGAAGAAG 
GAAGAAGGAAGAAAGAAGGAGAAGGAGGCCTGCTAGGTGCTAGGTAGACTGTCAAATCTCA,GAGCAAAATGAAAATAACA 

25 AAGTTTTAAAGGGAAAGAAAAACCCCAGCTCTTTGGACTTCCTTAGGCCTGAACTTCATCTCAAGCAGCTTCCTTCCACA 
GAC.aAGCGTGTATGGAGCGAGTGAGTTCAAAGCAGAAAGGGAGGAGAAGCAGGCAAGGGTGGAGGCTGTGGGTGACACCA 
GCCAGGACCCCTGAAAGGGAGTGGTTGTTTTCCTGCCTCAGCCCCACGCTCCTGCCGGTCCTGCACCTGCTGTAACCGTC 
GATGTTGGTGCCAGGTGCCCACCTGGGAAGGATGCTGTGCAGGGGGCTTGCCAAACTTTGGTGGGTTTCAGAAGCCCCAG 
GCACTTGTGGCAGGCACAATTACAGCCCCTCCCCAAAGATGCCCACGTCCTTCTCCTGGAACCTGTGAATGTGTCACCCG 

30 CAAGGCAGAGGCTGGTGAAGGCTGCAGGTGGAATCACGGCTGCCAGTCAGCCGATCTTAAGGTCATCCTGGATTATCTGG 
TGGGCCTGATATGGCCACAAGGGTCCCTAGAAGTGAGAGAGGGAGGCAGGGGAGAGTCAGAGAGGGGACGTGAGAAGGAC 
CACTGGCCACTGCTGGCTTTGAGATGGAGGAGGGGGTCCCCAGCCAAGGAATGGGGGCAGCCGCTCCATGCTGGAAAAGC 
AAGCAATCCTCCCCGGTCCTGAGGGCACACGGCCCTGCCCACGCCTCGATTTCAGGCCAGTGGGACCTGTTTCAGCTTTC 
CGGCCTCCAGAGCTGTAAGATGATGCGTTTGTGTTCAGCCACTAAGCTGCAGTGATTCGTCACAGCAGCAAATGGAATAG 

35 CAGTACAGGGAAATGAATACAGGGACAGTTCTCAGAGTGACTCTCAGCCCACCCCTGGG 
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Characterization of the exons showed, interestingly, that the functionally important 
hTC protein domains which are described in our Patent Application 
PCT/EP/98/03469 are arranged on separate exons. The telomerase-characteristic T 
motif is located on exon 3. The RT (reverse transcriptase) motifs 1-7, which are 
5 important for the catalytic fiinction of the telomerase, are located on the following 
exons: RT motifs I and 2 on exon 4, RT motif 4 on exon 9, RT motif 5 on exon 10, 
and RT motifs 6 and 7 on exon 11. RT motif 3 is shared by exons 5 and 6 (see 
Fig. 8). 

1 0 Elucidation of the exon-intron structure of the hTC gene also shows that the four 
deletions or insertion variants of the hTC cDNA which were described in our Patent 
Application PCT/EP/98/03469, as well as three additional hTC insertion variants 
which are described in the literature (Kilian et al., 1997), in all probability represent 
alternative splicing products. As shown in Fig. 8, the splicing variants can be divided 

15 into two groups: deletion variants and insertion variants. 

The hTC variants in the deletion group lack specific sequence segments. The 36 bp 
in-frame deletion in variant DELI in all probability results from using an alternative 
3' splice acceptor sequence in exon 6, resulting in a part of RT motif 3 being lost. In 
20 variant DEL2, the normal 5' splice donor and 3' splice acceptor sequences of introns 
6, 7 and 8 are not used. Instead exon 6 is fused directly to exon 9, resulting in a 
displacement arising in the open reading frame and a stop codon appearing in exon 
10. Variant Del3 is a combination of variants 1 and 2. 

25 The insertion variant group is characterized by the insertion of intron sequences 

which lead to premature cessation of translation. Instead of the 5' splice donor 
sequence of mtron 5, which is normally used, use is made, m variant INSl, of an 
alternative, 3'-located splice site, resulting in the insertion of the first 38 bp from 
intron 4 between exon 4 and exon 5. The insertion, m variant INS2, of a region of the 

30 intron 1 1 sequence likewise results from using an alternative 5' splice donor 
sequence in intron 11. Since this variant was only described inadequately in the 
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literature (Kilian et al., 1997), it is not possible to determine the precise alternative 5' 
splice donor sequence in this variant. The insertion of intron 14 sequences between 
exon 14 and exon 15 in variant INS3 comes from using an alternative 3' splice 
acceptor sequence, resulting in the 3' part of intron 14 not being spliced. 

5 

The hTC variant INS4 (variante 4), which is described in our Patent Application 
PCT/EP/98/03469, is characterized by exon 15, and the 5' part region of exon 16, 
being replaced by the first 600 bp of intron 14. This variant can be attributed to the 
use of an alternative internal 5' splice donor sequence in intron 14 and an alternative 
10 3' splice acceptor sequence in exon 16, resulting in an altered C terminus. 

The in vivo generation of hTC protein variants which are probably non-functional 
and which could interfere with the function of the complete hTC protein constitutes a 
possible mechanism, in addition to transcription regulation, for controlling hTC 
15 protein function. The function of the hTC splicing variants is not yet known. 

Although most of these variants presumably encode proteins without reverse 
transcriptase activity, they could nevertheless play a crucial role as transdominant- 
negative telomerase regulators by, for example, competing for interaction with 
important binding partners. 

20 

The search for possible transcription factor binding sites was carried out using the 
„find pattern" algorithm from the Genetics Computer Group (Madison, USA) GCG 
Sequence Analysis program package. This resulted in the identification of a variety 
of potential binding sites for transcription factors in the nucleotide sequence of intron 
25 2, which binding sites are listed in Tab. 2. In addition, an Spl binding site was found 
in intron 1 (pos. 43), and a c-Myc bindmg site was found in the 5'-untranslated region 
(cDNA position 29-34, cf Fig. 6). 
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Example 6 

In order to ascertain the start point(s) of hTC transcription in HL 60 cells, the 5' end 
of the hTC mRNA was determined by means of primer extension analysis. 

5 

2 jxg of poly A* RNA from HL-60 cells were denaturated at 65°C for 10 min. 1 jil of 
RNasin (30-40 U/ml) and 0.3-1 pmol of radioactively labelled primer 
(5'GTTAAGTTGTAGCTTACACTGGTTCTC 3'; 2.5-8x10' cpm) were added for 
primer annealing, and the whole was incubated, at 3TC for 30 min, in a total volume 

10 of 20 |il. After the addition of 10 [il of Sxreverse transcriptase buffer (from Gibco- 
BRL), 2 ^1 of 10 mM dNTPs, 2 jil RNasin (see above), 5 ^1 of 0.1 M DTT (from 
Gibco-BRL) 2 ^1 of ThermoScnpt RT (15 U/|il; from Gibco-BRL) and 9 ^il of 
DEPC-treated water, primer extension took place, at 58°C for 1 h, in a total volume 
[lacuna] . The reaction was stopped by adding 4 |al of 0.5 M EDTA, pH 8.0, and the 

15 RNA was degraded, at 37°C for 30 min, after having added 1 |il of RNaseA 
(10 mg/ml). 2.5 jig of sheared calf thymus DNA and 100 ^1 of TE were then added, 
and the mixture was extracted once with 150 \i\ of phenol/chloroform (1:1). The 
DNA was precipitated, at -70°C for 45 min, after adding 15 ^1 of 3 M Na acetate and 
450 [il of ethanol, and then centrifuged at 14,000 rpm for 15 min. The precipitate was 

20 washed once with 70% ethanol, dried in air and dissolved in 8 fil of sequencing stop 
solution. After 5 min of denaturation at 80°C, the samples were loaded onto a 6% 
polyacrylamide gel and fractionated electrophoretically (Ausubel et al., 1987) 
(Fig. 5). 

25 In this connection, a mam transcription start site was identified which is located 

1767 bp 5' of the ATG start codon of the hTC cDNA sequence (nucleotide position 
3346 in Fig. 4). In addition to this, the nucleotide sequence around this main 
transcription start (TTA^iTTGT) represents an initiator element (Inr), which, in 6 out 
of 7 nucleotides, matches the consensus motif (PyPyA^sNa/tPyPy) (Smale, 1997) of 

30 an initiator element. 



Le A 32 805-Foreign Countries 

-51 - 

It was not possible to identify any unambiguous TATA box in the immediate vicinity 
of the experimentally identified main transcription start, which means that the hTC 
promoter has probably to be classified in the family of TATA-less promoters (Smale, 
1997). However, a potential TATA box from nucleotide position 1306 to nucleotide 
5 position 1311 (Fig. 4) was found by means of bioinformatics analysis. The subsidiary 
transcription starts which were additionally observed around the main transcription 
start have also been described in the case of other TATA-less promoters (Geng and 
Johnson, 1993), for example in the strongly regulated promoters of some cell cycle 
genes (Wicked a/., 1995). 

10 

Example 7 

In addition to the start point of the hTC transcript which was described in Example 6 
and identified in HL60 cells, a further transcription start region was also identified in 
15 HL60 cells. With the aid of RT-PCR analyses, the region of the hTC gene 
transcription start in HL60 cells was localized to bp -60 to bp -105. 

The cDNA for this was synthesized using a First Strand cDNA Synthesis kit 
(Clontech), in accordance with the manufacturer's instructions, and employing 0.4 \xg 

20 of HL60 cell polyA RNA (Clontech) and the gene-specific primer GSP13 
(5'-CCTCCAAAGAGGTGGCTTCTTCGGC-3', cDNA position 920-897). In a final 
volume of 50 fil, 10 pmol dNTP mix were added to 1 \x\ of cDNA, and a PGR 
reaction was carried out in IxPCR reaction buffer F (PCR-Optimizer kit from 
InVitrogen) and using one unit of platinum Taq DNA polymerase (from Gibco/BRL). 

25 10 pmol of each of the 5' and 3' pnmers defined below were added as primers. The 

PCR was carried out in 3 steps. A two-minute denaturation at 94°C was followed by 
36 PCR cycles in which the DNA was first of all denatured at 94°C for 45 sec and, 
after that, the primers were annealed, and the DNA chain was extended at 68°C for 
5 min. The cycles were concluded by a chain extension at 68°C for 10 min. In all, six 

30 different 5' PCR primers (primer HTRT5B: 

5'-CGCAGCCACTACCGCGAGGTGC-3\ cDNA position 105 to 126; primer C5S: 
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5'-CTGCGTCCTGCTGCGCACGTGGGAAGC-3% 5 '-flanking region -49 to -23; 
primer PRO-TESTl: 5'-CTCGCGGCGCGAGTTTCAGGCAG-3', 5'-flanking 
region -74 to -52; primer PRO-TEST2: 5'-CCAGCCCCTCCCCTTCCTTTCC-3', 
5 '-flanking region -112 to -91; primer PRO-TEST4: 
5 5'-CCAGCTCCGCCTCCTCCGCGC-3', 5'-flanking region -191 to -171; primer 
RP-3A: 5'-CTAGGCCGATTCGACCTCTCTCC-3', 5 '-flanking region ^27 to 
-405) were combined with the 3' PGR primer C5Rback 
(5'-GTCCCAGGGCACGCACACCAG-3', cDNA position 245 to 225). Genomic 
DNA was also employed for the PGR, as a control, in addition to the Oligo dT- and 
10 GSP13-primed cDNAs. As Fig. 9 shows, a PGR product was only obtained with the 
primer combinations HTRT5B-C5Rback, C5S-C5Rback and PRO-TEST l-C5Rback, 
indicating that the start point for hTC transcription lies in the region between bp-60 
and bp- 105. 

15 Example 8 

Several extremely GC-rich regions, so-called CpG Islands, are located in the isolated 
5'-flanking region, of about 1 1 .2 kb in size, of the hTC gene. One CpG Island, having 
a GC content of > 70%, extends from bp - 1214 into intron 2. Two further GC-rich 
20 regions having a GC content of > 60% extend from bp -3872 to bp -3113 and from 
bp -5363 to bp -3941, respectively. The positions of the CpG Islands are shown 
graphically in Fig. 1 1 . 

The search for possible transcription factor bmding sites was carried out using the 
25 "Find Pattern" algorithm from the Genetics Computer Group (Madison, USA) GCG 

Sequence Analysis program package. This resulted in the identification of a variety 
of potential binding sites in the region up to -900 bp upstream of the translation start 
codon ATG: five Spl binding sites, one c-Myc binding site, and one CCAC box 
(Fig. 10). In addition, a CCAAT box and a second c-Myc binding site were found at 
30 positions -1788 and -3995, respectively, of the 5'-flanking region. 
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Example 9 

In order to analyse the activity of the hTC promoter, PCR amplification was used to 
generate four hTC promoter sequence segments of differing length, which segments 
5 were cloned into the Promega vector pGL2 5' in front of the luciferase reporter gene. 
The 8.5 kb Sad fragment which was subcloned from phage clone P12 was selected 
as the DNA source for the PCR amplification, hi a final volume of 50 ^il, 10 pmol of 
dNTP mix were added to 35 ng of this DNA, and a PCR reaction was carried out in 
IxPCR reaction buffer (PCR-Optimizer kit from hiVitrogen) and using one unit of 

10 platinum Taq DNA polymerase (from Gibco/BRL). hi each case 20 pmol of the 5' 
and 3' primers which are defined below were added as primers. The PCR was carried 
out in three steps. A two-minute denaturation at 94°C was followeed by 30 PCR 
cycles in which the DNA was first of all denamrated at 94°C for 45 sec, after which 
the primers were annealed, and the DNA chain was extended, at 68°C for 5 min. The 

15 cvcles were concluded by a chain extension at 68°C for 10 min. The selected 3' PCR 

primer was in each case the primer PK-3A 
(5'-GCAAGCTTGACGCAGCGCTGCCTGAAACTCG-3', position -43 to -65), 
which primer recognizes a sequence region 42 bp upstream of the ATG START 
codon. A promoter fragment of 405 1 bp in size (NPK8) was amplified by combining 

20 the PK-3A pnmers with the 5' PCR primer PK-5B 
(5'-CCAGATCTCTGGAACACAGAGTGGCAGTTTCC-3', posifion ^093 to 
-4070). Combining the pair of primers PK-3A and PK-5C 
(5'-CCAGATCTGCATGAAGTGTGTGGGGATTTGCAG-3', position -3120 to 
-3096) led to the amplification of a promoter fragment of 3078 bp in size (NPK15). 

25 Use of the primer combination PK-3A and PK-5D 

(5'-GGAGATCTGATCTTGGCTTACTGCAGCCTCTG-3', position -2110 to 
-2087) amplified a promoter fragment of 2068 bp m size (NPK22). Finally, using the 
primer combination PK-3A and PK-5E 

(5'-GGAGATCTGTCTGGATTCCTGGGAAGTCCTCA-3', position -1125 to 

30 -1 102) led to the amplificafion of a promoter fragment of 1083 bp in size (NPK27). 
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The PK-3A primer contains a Hindlll recognition sequence. The different 5' primers 
contain a Bglll recognition sequence. 

The resulting PCR products were purified using the Qiagen QIA quick spin PGR 
5 purification kit, in accordance with the manufacturer's instructions, and then digested 
with the restriction enzymes Bglll and HindllL The pGL2 promoter vector was 
digested with the same restriction enzymes, and the SV40 promoter contained in this 
vector was released and removed. The PCR promoter fragments ligated into the 
vector, which was then transformed into competent DH5a bacteria (from 
10 Gibco/BRL). DNA for the promoter activity analyses, which are described below, 
was isolated fi-om transformed bacterial clones using the Qiagen plasmid kit. 

Example 10 

15 The activity of the hTC promoter was analysed in transient transfections in 
eukaryotic cells. 

All the work with eukaryotic cells was carried out at a sterile workstation. CHO-Kl 
and HEK 293 cells were obtained from the American Type Culture collection. 

20 

CHO-Kl cells were kept in DMEM Nut Mix F-12 cell culture medium (from Gibco- 
BRL, order number: 21331-020) containing 0.15% streptomycin/penicillin, 2 mM 
glutamine and 10% FCS (from Gibco-BRL). 

25 HEK 293 cells were cultured in DMOD cell culture medium (from Gibco-BRL, order 
number: 41965-039) containing 0.15% streptomycin/penicillin, 2 mM glutamine and 
10% FCS (from Gibco-BRL). 

CHO-Kl and HEK 293 cells were cultured at 37°C in a water-saturated atmosphere 
30 while being gassed with 5% CO2. When the cell lawn was confluent, the medium 
was sucked off, after which the cells were washed with PBS (100 mM KH2PO4 pH 
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7.2; 150 mM NaCl) and released by adding a trypsin-EDTA solution (from Gibco- 
BRL). The trypsin was inactivated by adding medium and the cell count was 
determined using a Neubauer counting chamber in order to plate out the cells at the 
desired density. 

5 

For the transfection, in each case 2x 10^ HEK 293 cells were plated out, per well, in a 
24-well cell culture plate. The HEK 293 medium was removed after 3 hours. For the 
transfection, up to 2.5 of plasmid DNA, 1 of a CMV B-Gal plasmid construct 
(from Stratagene, order numner: 200388), 200 p.1 of serum-free medium and 10 |a.l of 

10 transfection reagent (DOTAP from Boehringer Marmheim) were incubated at room 
temperature for 15 minutes and then dropped uniformly onto the HEK 293 cells. 1.5 
ml of medium were added after 3 hours. The medium was changed after 20 hours. 
After a further 24 hours, the cells were harvested for determining the luciferase 
activity and the B-Gal activity. For this, the cells were lysed, at room temperature for 

15 15 minutes, in the cell culmre lysis reagent (25 mM Tris [pH 7.8] containing H3PO4; 

2 mM CDTA; 2 mM DTT; 10% glycerol; 1% Triton X-100). Twenty |il of this cell 
lysate were mixed with 1 00 p.1 of luciferase assay buffer (20 mM Tricin; 1 .07 mM 
(MgC03)4 Mg(OH)2-5H20; 2.67 mM MgS04; 0.1 mM EDTA; 33.3 mM DTT; 
270 laM coenzyme A; 470 |iM luciferin, 530 ^iM ATP), and the light generated by 

20 the luciferase was measured. 

In order to measure the B-galactosidase activity, equal quantities of cell lysate and B- 
galactosidase assay buffer (100 mM sodium phosphate buffer, pH 7.3; 1 mM MgCb; 
50 mM 6-mercaptoethanol; 0.665 mg of ONPG/ml) were incubated at 37°C for at 
25 least 30 minutes or until a slight yellow coloration appeared. The reaction was 

stopped by adding 100 |j.1 of 1 M Na2C03, and the absorption was determmed at 
420 nm. 

In order to analyse the hTC promoter, four hTC promoter sequence segments of 
30 differing length were cloned 5 ' in front of the luciferase reporter gene (cf. Example 
9). 
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The relative luciferase activities of two independent transfections in HEK 293 cells, 
using the constructs NPK8, NPK15, NPK22 and NPK27, are plotted in Fig. 11. Each 
experiment was carried out in duplicate. The standard deviation has also been given. 
5 The construct NPK 27 exhibits a luciferase activity which is 40 times higher than the 
basal activity of the promoterless luciferase control construct (pGL2 -basic) and from 
2 to 3 times higher than that of the SV40 promoter control construct (pGL2PRO). 
Interestingly, a luciferase activity which was from 2 to 3 times lower than that 
obtained with the NPK 27 construct was observed in cells which were transfected 
10 with longer hTC promoter constructs (NPK8, NPK15, NPK22). Similar results were 
also observed in CHO cells (data not shown). 
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SEQUENCE LISTING 

<110> Bayer AG 

<12 0> Regulatory DNA secjuences from the 5i region of the gene 
for the human catalytic telomerase subunit, and 
their diagnostic and therapeutic use 

<130> LeA32805-Foreign Countries 



<160> 20 

<170> Patentin Vers. 2.0 

<210> 1 

<211> 5126 

<212> DNA 

<213> Homo sapiens 

<400> 1 

gagctctgaa ccgtggaaac gaacatgacc cttgcctgcc tgcctccctg ggtgggtcaa 60 
gggtaatgaa gtggtgtgca ggaaatggcc atgtaaatta cacgactctg ctgatgggga 120 
ccgttccttc catcattatt catcttcacc cccaaggact gaatgattcc agcaacttct 180 
tcgggtgtga caagccatga caaaactcag tacaaacacc actcttttac taggcccaca 240 
gagcacgggc cacacccctg atatattaag agtccaggag agatgaggct gctttcagcc 300 
accaggctgg ggtgacaaca gcggctgaac agtctgttcc tctagactag tagaccctgg 360 
caggcactcc cccaaattct agggcctggt tgctgcttcc cgagggcgcc atctgccctg 420 
gagactcagc ctggggtgcc acactgaggc cagccctgtc tccacaccct ccgcctccag 480 
gcctcagctt ctccagcagc ttcctaaacc ctgggtgggc cgtgttccag cgctactgtc 540 
tcacctgtcc cactgtgtct tgtctcagcg acgtagctcg cacggttcct cctcacatgg 600 
ggtgtctgtc tccttcccca acactcacat gcgttgaagg gaggagattc tgcgcctccc 660 
agactggctc ctctgagcct gaacctggct cgtggccccc gatgcaggtt cctggcgtcc 72 0 
ggctgcacgc tgacctccat ttccaggcgc tccccgtctc ctgtcatctg ccggggcctg 780 
ccggtgtgtt cttctgtttc tgtgctcctt tccacgtcca gctgcgtgtg tctctgcccg 340 
ctagggtctc ggggttttta taggcatagg acgggggcgt ggtgggccag ggcgctcttg 900 
ggaaatgcaa catttgggtg tgaaagtagg agtgcctgtc ctcacctagg tccacgggca 960 
caggcctggg gatggagccc ccgccaggga cccgcccttc tctgcccagc actttcctgc 1020 
ccccctccct ctggaacaca gagtggcagt ttccacaagc actaagcatc ctcttcccaa 1080 
aagacccagc attggcaccc ctggacattt gccccacagc cctgggaatt cacgtgacta 1140 
cgcacatcat gtacacactc ccgtccacga ccgacccccg ctgttttatt ttaatagcta 1200 
caaagcaggg aaatccctgc taaaatgtcc tttaacaaac tggttaaaca aacgggtcca 1260 
tccgcacggt ggacagttcc tcacagtgaa gaggaacatg ccgtttataa agcctgcagg 13 20 
catctcaagg gaattacgct gagtcaaaac tgccacctcc atgggatacg tacgcaacat 1380 
gctcaaaaag aaagaatttc accccatggc aggggagtgg ttaggggggt taaggacggt 1440 
gggggcggca gctgggggct actgcacgca ccttttacta aagccagttt cctggttctg 1500 
atggtattgg ctcagttatg ggagactaac cataggggag tggggatggg ggaacccgga 156 0 
ggctgtgcca tctttgccat gcccgagtgt cctgggcagg ataatgctct agagatgccc 1620 
acgtcctgat tcccccaaac ctgtggacag aacccgcccg gccccagggc ctttgcaggt 1680 
gtgatctccg tgaggaccct gaggtctggg atccttcggg actacctgca ggcccgaaaa 17 40 
gtaatccagg ggttctggga agaggcgggc aggagggtca gaggggggca gcctcaggac 1800 
gatggaggca gtcagtctga ggctgaaaag ggagggaggg cctcgagccc aggcctgcaa 1860 
gcgcctccag aagctggaaa aagcggggaa gggaccctcc acggagcctg cagcaggaag 1920 
gcacggctgg cccttagccc accagggccc atcgtggacc tccggcctcc gtgccatagg 1980 
agggcactcg cgctgccctt ctagcatgaa gtgtgtgggg atttgcagaa gcaacaggaa 2040 
acccatgcac tgtgaatcta ggattatttc aaaacaaagg tttacagaaa catccaagga 2100 
cagggctgaa gtgcctccgg gcaagggcag ggcaggcacg agtgatttta tttagctatt 216 0 
ttattttatt tacttacttt ctgagacaga gttatgctct tgttgcccag gctggagtgc 2220 
agcggcatga tcttggctca ctgcaacctc cgtctcctgg gttcaagcaa ttctcgtgcc 2280 
tcagcctccc aagtagctgg gatttcaggc gtgcaccacc acacccggct aattttgtat 2340 
ttttagtaga gatgggcttt caccatgttg gtcaagctga tctcaaaatc ctgacctcag 2400 
gtgatccgcc cacctcagcc tcccaaagtg ctgggattac aggcatgagc cactgcacct 2460 
ggcctattta accattttaa aacttccctg ggctcaagtc acacccactg gtaaggagtt 2520 
catggagttc aatttcccct ttactcagga gttaccctcc tttgatattt tctgtaattc 2580 
ttcgtagact ggggatacac cgtctcttga catattcaca gtttctgtga ccacctgtta 2640 
tcccatggga cccactgcag gggcagctgg gaggctgcag gcttcaggtc ccagtggggt 2700 
tgccatctgc cagtagaaac ctgatgtaga atcagggcgc aagtgtggac actgtcctga 2760 
atctcaatgt ctcagtgtgt gctgaaacat gtagaaatta aagtccatcc ctcctactct 2820 
actgggattg agccccttcc ctatcccccc ccaggggcag aggagttcct ctcactcctg 2880 

tgttggtttg tttgttttgt tttgagaggc ggtttcactc ttgttgctca ggctggaggg 3000 
agtgcaatgg cgcgatcttg gcttactgca gcctctgcct cccaggttca agtgattctc 3060 
ctgcttccgc ctcccatttg gctgggatta caggcacccg ccaccatgcc cagctaattt 3120 
tttgtatttt tagtagagac gggggtgggt ggggttcacc atgttggcca ggctggtctc 3180 
gaacttctga cctcagatga tccacctgcc tctgcctcct aaagtgctgg gattacaggt 3240 
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gtgagccacc atgcccagct cagaatttac 
gaagctcacc ccactcaagt gttgtggtgt 
tgttagaaca ctcttgatgt tttacactgt 
acacactaac tgcacccata atactggggt 
tgccgggagg cgtttcctcg ccatgcacat 
ttccatttct tctcttccct cttttaaaat 
aaccagtgta agctacaact taacttttgt 
cctagtggca gagacaattc acaaacacag 
ggatttctag aagagcgacc tgtaatccta 
gagcgtgaca gcccagggag ggtgcgaggc 
aatttcctcc ggcagtttct gaaagtagga 
atttcagtgt ttgccgacct cagctacagc 
tttctcgccc ccttagatcc aaacttgagc 
cagctgtcct gcggttgtgc cggggcccca 
tctactgctg ggctggaagt cgggcctcct 
gcctggaccc cgaggctgcc ctccaccctg 
catctgccag acagagtgcc ggggcccagg 
ccggtgcgcg gccagcagga gcgcctggct 
cccggtgggt gattaacaga tttggggtgg 
gagaacctgc aaagagaaat gacgggcctg 
tgcagggagg cactccggga ggtcccgcgt 
tcgtccccag ccgcgtctac gcgcctccgt 
ccggagcccg acgccccgcg tccggacctg 
gcggccaaag ggtcgccgca cgcacctgtt 
cgggttaccc cacagcctag gccgattcga 
cctgcaccct gggagcgcga gcggcgcgcg 
ccgcccggag cagctgcgct gtcggggcca 
gacgcccagg accgcgctcc ccacgtggcg 
ccccttcacc ttccagctcc gcctcctccg 
gggtccccgg cccagccccc tccgggccct 
cgccctctcc tcgcggcgcg agtttcaggc 
ccctggcccc ggccaccccc gcgatg 

<210> 2 

<211> 4042 

<212> DNA 

<213> Homo sapiens 

<400> 2 

gtttcaggca gcgctgcgtc ctgctgcgca 

cgatgccgcg cgctccccgc tgccgagccg 

aggtgctgcc gctggccacg ttcgtgcggc 

agcgcgggga cccggcggct ttccgcgcgc 

gggacgcacg gccgcccccc gccgccccct 

tggtggcccg agtgctgcag aggctgtgcg 

gcttcgcgct gctggacggg gcccgcgggg 

gcagctacct gcccaacacg gtgaccgacg 

tgctgcgccg cgtgggcgac gacgtgctgg 

tgctggtggc tcccagctgc gcctaccagg 

ctgccactca ggcccggccc ccgccacacg 

aacgggcctg gaaccatagc gtcagggagg 

gtgcgaggag gcgcgggggc agtgccagcc 

gtggcgctgc ccctgagccg gagcggacgc 

gcaggacgcg tggaccgagt gaccgtggtt 

aagaagccac ctctttggag ggtgcgctct 

gccgccagca ccacgcgggc cccccatcca 

cttgtccccc ggtgtacgcc gagaccaagc 

agctgcggcc ctccttccta ctcagctctc 

tcgtggagac catctttctg ggttccaggc 

cccgcctgcc ccagcgctac tggcaaatgc 

acgcgcagtg cccctacggg gtgctcctca 

ccccagcagc cggtgtctgt gcccgggaga 

aggaggacac agacccccgt cgcctggcgc 

aggtgtacgg cttcgtgcgg gcctgcctgc 

ccaggcacaa cgaacgccgc ttcctcagga 

atgccaagct ctcgctgcag gagctgacgt 

tgcgcaggag cccaggggtt ggctgtgttc 

tcctggccaa gttcctgcac tggctgaCga 

tcttttatgt cacggagacc acgtttcaaa 

tctggagcaa gttgcaaagc atcggaatca 

agctgtcgga agcagaggtc aggcagcatc 

gactccgctt catccccaag cctgacgggc 

tgggagccag aacgttccgc agagaaaaga 

cactgttcag cgtgctcaac tacgagcggg 

tgctgggcct ggacgatatc cacagggcct 

aggacccgcc gcctgagctg tactttgtca 

tcccccagga caggctcacg gaggtcatcg 

gcgtgcgtcg gtatgccgtg gtccagaagg 



tctgtttaga aacatctggg tctgaggtag 3300 
tttaagccaa tgatagaatt tttttattgt 3360 
gatgactaag acatcatcag cttttcaaag 3420 
gtcttctggg tatcagcaat cttcattgaa 3480 
ggtgttaatt actccagcat aatcttctgc 3540 
tgtgttttct atgttggctt ctctgcagag 3600 
tggaacaaat tttccaaacc gcccctttgc 3660 
ccctttaaaa aggcttaggg atcactaagg 3720 
agtatttaca agacgaggct aacctccagc 3780 
ctgttcaaat gctagctcca taaataaagc 3840 
aaggttacat ttaaggttgc gtttgttagc 3900 
atccctgcaa ggcctcggga gacccagaag 3960 
aacccggagt ctggattcct gggaagtcct 4020 
ggtctggagg ggaccagtgg ccgtgtggct 4 080 
agctctgcag tccgaggctt ggagccaggt 4140 
tgcgggcggg atgtgaccag atgttggcct 4200 
gtcaaggccg ttgtggctgg tgtgaggcgc 4260 
ccatttccca ccctttctcg acgggaccgc 4320 
tttgctcatg gtggggaccc ctcgccgcct 4380 
tgtcaaggag cccaagtcgc ggggaagtgt 4 44 0 
gcccgtccag ggagcaatgc gtcctcgggt 4500 
cctccccttc acgtccggca ttcgtggtgc 4560 
gaggcagccc tgggtctccg gatcaggcca 4620 
cccagggcct ccacatcatg gcccctccct 4680 
cctctctccg ctggggccct cgctggcgtc 4740 
ggcggggaag cgcggcccag acccccgggt 4800 
ggccgggctc ccagtggatt cgcgggcaca 4860 
gagggactgg ggacccgggc acccgtcctg 4920 
cgcggacccc gccccgtccc gacccctccc 4980 
cccagcccct ccccttcctt tccgcggccc 5040 
agcgctgcgt cctgctgcgc acgtgggaag 5100 
5126 



cgtgggaagc cctggccccg gccacccccg 60 
tgcgctccct gctgcgcagc cactaccgcg 120 
gcctggggcc ccagggctgg cggctggtgc 180 
tggtggccca gtgcctggtg tgcgtgccct 240 
ccttccgcca ggtgtcctgc ctgaaggagc 300 
agcgcggcgc gaagaacgtg ctggccttcg 3 60 
gcccccccga ggccttcacc accagcgtgc 420 
cactgcgggg gagcggggcg tgggggctgc 4 80 
ttcacctgct ggcacgctgc gcgctctttg 540 
tgtgcgggcc gccgctgtac cagctcggcg 600 
ctagtggacc ccgaaggcgt ctgggatgcg 6 60 
ccggggtccc cctgggcctg ccagccccgg 720 
gaagtctgcc gttgcccaag aggcccaggc 7 80 
ccgttgggca ggggtcctgg gcccacccgg 840 
tctgtgtggt gtcacctgcc agacccgccg 900 
ctggcacgcg ccactcccac ccatccgtgg 9 60 
catcgcggcc accacgtccc tgggacacgc 102 0 
acttcctcta ctcctcaggc gacaaggagc 1080 
tgaggcccag cctgactggc gctcggaggc 1140 
cctggatgcc agggactccc cgcaggttgc 1200 
ggcccctgtt tctggagctg cttgggaacc 1260 
agacgcactg cccgctgcga gctgcggtca 1320 
agccccaggg ctctgtggcg gcccccgagg 1380 
agctgctccg ccagcacagc agcccctggc 1440 
gccggctggt gcccccaggc ctctggggct 1500 
acaccaagaa gttcatctcc ctggggaagc 1560 
ggaagatgag cgtgcgggac tgcgcttggc 1620 
cggccgcaga gcaccgtctg cgtgaggaga 1680 
gtgtgtacgt cgtcgagctg ctcaggtctt 1740 
agaacaggct ctttttctac cggaagagtg 1800 
gacagcactt gaagagggtg cagctgcggg 1860 
gggaagccag gcccgccctg ctgacgtcca 1920 
tgcggccgat Cgtgaacatg gactacgtcg 1980 
gggccgagcg tctcacctcg agggtgaagg 2040 
cgcggcgccc cggcctcctg ggcgcctctg 2100 
ggcgcacctt cgtgctgcgt gtgcgggccc 2150 
aggtggatgt gacgggcgcg tacgacacca 2220 
ccagcatcat caaaccccag aacacgtact 2280 
ccgcccatgg gcacgtccgc aaggccttca 2340 
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agagccacgt ctctaccttg acagacctcc agccgtacat gcgacagttc gtggctcacc 2400 
tgcaggagac cagcccgctg agggatgccg tcgtcatcga gcagagctcc tccctgaatg 2460 
aggccagcag tggcctcttc gacgtcttcc tacgcttcat gtgccaccac gccgtgcgca 2520 
tcaggggcaa gtcctacgtc cagtgccagg ggatcccgca gggctccatc ctctccacgc 2580 
tgctctgcag cctgtgctac ggcgacatgg agaacaagct gtttgcgggg attcggcggg 2 64 0 
acgggctgct cctgcgtttg gtggatgatt tcttgttggt gacacctcac ctcacccacg 2700 
cgaaaacctt cctcaggacc ctggtccgag gtgtccctga gtatggctgc gtggtgaact 2760 
tgcggaagac agtggtgaac ttccctgtag aagacgaggc cctgggtggc acggcttttg 2 82 0 
ttcagatgcc ggcccacggc ctattcccct ggtgcggcct gctgctggat acccggaccc 2880 
tggaggtgca gagcgactac tccagctatg cccggacctc catcagagcc agtctcacct 2940 
tcaaccgcgg cttcaaggct gggaggaaca tgcgtcgcaa actctttggg gtcttgcggc 3000 
tgaagtgtca cagcctgttt ctggatttgc aggtgaacag cctccagacg gtgtgcacca 3 06 0 
acatctacaa gatcctcctg ctgcaggcgt acaggtttca cgcatgtgtg ctgcagctcc 3120 
catttcatca gcaagtttgg aagaacccca catttttcct gcgcgtcatc tctgacacgg 318 0 
cctcccCctg ctactccatc ctgaaagcca agaacgcagg gatgtcgctg ggggccaagg 3240 
gcgccgccgg ccctctgccc tccgaggccg tgcagtggct gtgccaccaa gcattcctgc 3300 
tcaagctgac tcgacaccgt gtcacctacg tgccactcct ggggtcactc aggacagccc 336 0 
agacgcagct gagtcggaag ctcccgggga cgacgctgac tgccctggag gccgcagcca 342 0 
acccggcact gccctcagac ttcaagacca tcctggactg atggccaccc gcccacagcc 3480 
aggccgagag cagacaccag cagccctgtc acgccgggct ctacgtccca gggagggagg 3540 
ggcggcccac acccaggccc gcaccgctgg gagtctgagg cctgagtgag tgtttggccg 3 600 
aggcctgcat gtccggctga aggctgagtg tccggctgag gcctgagcga gtgtccagcc 3 660 
aagggctgag tgtccagcac acctgccgtc ttcacttccc cacaggctgg cgctcggctc 3720 
caccccaggg ccagcttttc ctcaccagga gcccggcttc cactccccac ataggaatag 3780 
tccatcccca gattcgccat tgttcacccc tcgccctgcc ctcctttgcc ttccaccccc 3840 
accatccagg tggagaccct gagaaggacc ctgggagctc tgggaatttg gagtgaccaa 3 900 
aggtgtgccc tgtacacagg cgaggaccct gcacctggat gggggtccct gtgggtcaaa 3960 
ttggggggag gtgctgtggg agtaaaatac tgaatatatg agtttttcag ttttgaaaaa 4020 
aaaaaaaaaa aaaaaaaaaa aa 4042 

<210> 3 
<211> 11276 
<212> DNA 

<213> Homo sapiens 
<400> 3 

acttgagccc aagagttcaa ggctacggtg agccatgatt gcaacaccac acgccagcct 60 
tggtgacaga atgagaccct gtctcaaaaa aaaaaaaaaa aattgaaata atataaagca 12 0 
tcttctctgg ccacagtgga acaaaaccag aaatcaacaa caagaggaat tttgaaaact 180 
atacaaacac atgaaaatta aacaatatac ttctgaatga ccagtgagtc aatgaagaaa 240 
ttaaaaagga aattgaaaaa tttatttaag caaatgataa cggaaacata acctctcaaa 3 00 
acccacggta tacagcaaaa gcagtgctaa gaaggaagtt tatagctata agcagctaca 360 
tcaaaaaagt agaaaagcca ggcgcagtgg ctcatgcctg taatcccagc actttgggag 420 
gccaaggcgg gcagatcgcc tgaggtcagg agttcgagac cagcctgacc aacacagaga 48 0 
aaccttgtcg ctactaaaaa tacaaaatta gctgggcatg gtggcacatg cctgtaatcc 540 
cagctactcg ggaggctgag gcaggataac cgcttgaacc caggaggtgg aggttgcggt 60 0 
gagccgggat tgcgccattg gactccagcc tgggtaacaa gagtgaaacc ctgtctcaag 66 0 
aaaaaaaaaa aagtagaaaa acttaaaaat acaacctaat gatgcacctt aaagaactag 720 
aaaagcaaga gcaaactaaa cctaaaattg gtaaaagaaa agaaataata aagatcagag 78 0 
cagaaataaa tgaaactgaa agataacaat acaaaagatc aacaaaatta aaagttggtt 84 0 
ttttgaaaag ataaacaaaa ttgacaaacc tttgcccaga ctaagaaaaa aggaaagaag 900 
acctaaataa ataaagtcag agatgaaaaa agagacatta caactgatac cacagaaatc 960 
caaaggatca ctagaggcta ctatgagcaa ctgtacacta ataaattgaa aaacctagaa 1020 
aaaatagata aattcctaga tgcatacaac ctaccaagat tgaaccatga agaaatccaa 1080 
agcccaaaca gaccaataac aataatggga ttaaagccat aataaaaagt ctcctagcaa 1140 
agagaagccc aggacccaat ggcttccctg ctggatttta ccaatcattt aaagaagaat 1200 
gaattccaat cctactcaaa ctattctgaa aaatagagga aagaatactt ccaaactcat 1260 
tctacatggc cagtatCacc ctgattccaa aaccagacaa aaacacatca aaaacaaaca 1320 
aacaaaaaaa cagaaagaaa gaaaactaca ggccaatatc cctgatgaat actgatacaa 13 80 
aaatcctcaa caaaacacta gcaaaccaaa ttaaacaaca ccttcgaaag atcattcatt 1440 
gtgatcaagt gggatttatt ccagggatgg aaggatggct caacatatgc aaatcaatca 1500 
atgtgataca tcatcccaac aaaatgaagt acaaaaacta tatgattatt tcactttatg 1550 
cagaaaaagc atttgataaa attctgcacc cttcatgata aaaaccctca aaaaaccagg 1620 
tatacaagaa acatacaggc caggcacagt ggctcacacc tgcgatccca gcactctggg 1680 
aggccaaggt gggatgattg cttgggccca ggagtttgag actagcctgg gcaacaaaat 17 40 
gagacctggt ctacaaaaaa cttttttaaa aaattagcca ggcatgaCgg catatgcctg 1800 
tagtcccagc tagtctggag gctgaggtgg gagaatcact taagcctagg aggtcgaggc 1850 
tgcagtgagc catgaacatg tcactgtact ccagcctaga caacagaaca agaccccact 1920 
gaataagaag aaggagaagg agaagggaga agggagggag aagggaggag gaggagaagg 19 80 
aggaggtgga ggagaagtgg aaggggaagg ggaagggaaa gaggaagaag aagaaacata 2 04 0 
ttCcaacata ataaaagccc tatatgacag accgaggtag tattatgagg aaaaactgaa 2100 
agcctttcct ctaagatctg gaaaatgaca agggcccact ttcaccactg tgattcaaca 2150 
tagtactaga agtcctagct agagcaatca gataagagaa agaaataaaa ggcaCccaaa 2220 
ctggaaagga agaagtcaaa ttatcctgtt tgcagatgat atgatcttat atctggaaaa 2280 
gacttaagac accactaaaa aactattaga gctgaaattt ggtacagcag gatacaaaat 2340 
caatgtacaa aaatcagtag tatttctata ttccaacagc aaacaatctg aaaaagaaac 2400 
caaaaaagca gctacaaata aaattaaaca gctaggaatt aaccaaagaa gtgaaagatc 2460 
tctacaatga aaactataaa atgttgataa aagaaattga agagggcaca aaaaaagaaa 2 52 0 
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agatattcca tgttcataga ttggaagaat aaatactgtt aaaatgtcca tactacccaa 2580 
agcaatttac aaattcaatg caatccctat taaaatacta atgacgttct tcacagaaat 2 640 
agaagaaaca attctaagat ttgtacagaa ccacaaaaga cccagaatag ccaaagctat 2700 
cctgaccaaa aagaacaaaa ctggaagcat cacattacct gacttcaaat tatactacaa 2760 
agctatagta acccaaacta catggtactg gcataaaaac agatgagaca tggaccagag 2 82 0 
gaacagaata gagaatccag aaacaaatcc atgcatctac agtgaactca tttttgacaa 2880 
aggtgccaag aacatacttt ggggaaaaga taatctcttc aataaatggt gctggaggaa 2940 
ctggatatcc atatgcaaaa taacaatact agaactctgt ctctcaccat atacaaaagc 3 000 
aaatcaaaat ggatgaaagg cttaaatcta aaacctcaaa ctttgcaact actaaaagaa 3 060 
aacaccggag aaactctcca ggacattgga gtgggcaaag acttcttgag taattccctg 3120 
caggcacagg caaccaaagc aaaaacagac aaatgggatc atatcaagtt aaaaagcttc 3180 
tgcccagcaa aggaaacaat caacaaagag aagagacaac ccacagaatg ggagaatata 3240 
tttgcaaact attcatctaa caaggaatta ataaccagta tatataagga gctcaaacta 3300 
ctctataaga aaaacaccta ataagctgat tttcaaaaat aagcaaaaga tctgggtaga 3360 
catttctcaa aataagtcat acaaatggca aacaggcatc tgaaaatgtg ctcaacacca 3420 
ctgatcatca gagaaatgca aatcaaaact actatgagag atcatctcat cccagttaaa 3480 
atggctttta ttcaaaagac aggcaataac aaatgccagt gaggatgtgg ataaaaggaa 3540 
acccttggac actgttggtg ggaatggaaa ttgctaccac tatggagaac agtttgaaag 3600 
ttcctcaaaa aactaaaaat aaagctacca tacagcaatc ccattgctag gtatatactc 3660 
caaaaaaggg aatcagtgta tcaacaagct atctccactc ccacatttac tgcagcactg 3720 
ttcatagcag ccaaggtttg gaagcaacct cagtgtccat caacagacga atggaaaaag 3780 
aaaatgtggt gcacatacac aatggagtac tacgcagcca taaaaaagaa tgagatcctg 3 840 
tcagttgcaa cagcatgggg ggcactggtc agtatgttaa gtgaaataag ccaggcacag 3900 
aaagacaaac ttttcatgtt ctcccttact tgtgggagca aaaattaaaa caattgacat 3960 
agaaatagag gagaatggtg gttctagagg ggtgggggac agggtgacta gagtcaacaa 4020 
taatttattg tatgttttaa aataactaaa agagtataat tgggttgttt gtaacacaaa 4080 
gaaaggataa atgcttgaag gtgacagata ccccatttac cctgatgtga ttattacaca 4140 
ttgtatgcct gtatcaaaat atctcatgta tgctatagat ataaacccta ctatattaaa 4200 
aattaaaatt ttaatggcca ggcacggtgg ctcatgtccg taatcccagc actttgggag 4260 
gccgaggcgg gtggatcacc tgaggtcagg agtttgaaac cagtctggcc accatgatga 43 2 0 
aaccctgtct ctactaaaga tacaaaaatt agccaggcgt ggtggcacat acctgtagtc 4380 
ccaactactc aggaggctga gacaggagaa ttgcttgaac ctgggaggcg gaggttgcag 4440 
tgagccgaga tcatgccact gcactgcagc ctgggtgaca gagcaagact ccatctcaaa 4500 
acaaaaacaa aaaaaagaag attaaaattg taatttttat gtaccgtata aatatatact 4560 
ctactatatt agaagttaaa aattaaaaca attataaaag gtaattaacc acttaatcta 4620 
aaataagaac aaLgtatgtg gggtttctag cttctgaaga agtaaaagtt atggccacga 4680 
Cggcagaaat gtgaggaggg aacagtggaa gttactgttg ttagacgctc atactctctg 4740 
taagtgactt aattttaacc aaagacaggc tgggagaagt taaagaggca ttctataagc 4800 
cctaaaacaa ctgctaataa tggtgaaagg taatctctat taattaccaa taattacaga 4860 
tatctctaaa atcgagctgc agaattggca cgtctgatca caccgtcctc tcattcacgg 4920 
tgcttttttt cttgtgtgct tggagatttt cgattgtgtg ttcgtgtttg gttaaactta 4980 
atctgtatga atcctgaaac gaaaaatggt ggtgatttcc tccagaagaa ttagagtacc 5040 
tggcaggaag caggtggctc tgtggacctg agccacttca atcttcaagg gtctctggcc 5100 
aagacccagg tgcaaggcag aggcctgatg acccgaggac aggaaagctc ggatgggaag 5160 
gggcgatgag aagcctgcct cgttggtgag cagcgcatga agtgccctta tttacgcttt 5220 
gcaaagattg ctctggatac catctggaaa aggcggccag cgggaatgca aggagtcaga 52 8 0 
agcctcctgc tcaaacccag gccagcagct atggcgccca cccgggcgtg tgccagaggg 5340 
agaggagtca aggcacctcg aagtatggct taaatctttt tttcacctga agcagtgacc 5400 
aaggtgtatt ctgagggaag cttgagttag gtgccttctt taaaacagaa agtcatggaa 5460 
gcacccttct caagggaaaa ccagacgccc gctctgcggt catttacctc tttcctctct 5520 
ccctctcttg ccctcgcggt ttctgatcgg gacagagtga cccccgtgga gcttctccga 5580 
gcccgtgctg aggaccctct tgcaaagggc tccacagacc cccgccctgg agagaggagt 5540 
ctgagcctgg cttaataaca aactgggatg tggctggggg cggacagcga cggcgggatt 5700 
caaagactta attccatgag taaattcaac ctttccacat ccgaatggat ttggatttta 576 0 
tcttaatatt ttcttaaatt tcatcaaata acattcagga ctgcagaaat ccaaaggcgt 5820 
aaaacaggaa ctgagctatg tttgccaagg tccaaggact taataaccat gttcagaggg 5 88 0 
atttttcgcc ctaagtactt tttattggtt ttcataaggt ggcttagggt gcaagggaaa 5 940 
gtacacgagg agaggcctgg gcggcagggc tatgagcacg gcagggccac cggggagaga 6000 
gtccccggcc tgggaggctg acagcaggac cactgaccgt cctccctggg agctgccaca 6060 
ttgggcaacg cgaaggcggc cacgctgcgt gtgactcagg accccatacc ggcttcctgg 6120 
gcccacccac actaacccag gaagtcacgg agctctgaac ccgtggaaac gaacatgacc 6180 
cttgcctgcc tgcttccctg ggtgggtcaa gggtaatgaa gtggtgtgca ggaaatggcc 6240 
atgnaaatta cacgactctg ctgatgggga ccgttccttc catcattatt catcttcacc 6300 
cccaaggact gaatgattcc agcaacttct tcgggtgtga caagccatga caaaactcag 6360 
tacaaacacc actcttttac taggcccaca gagcacggsc cacacccctg atatattaag 6420 
agtccaggag agatgaggct gctttcagcc accaggctgg ggtgacaaca gcggctgaac 



tctgttcc tctagactag tagaccctgg caggcactcc cc. 



, 6540 

tgctgcttcc cgagggcgcc atctgccctg gagactcagc ctggggtgcc acactgaggc 6500 

cagccctgtc tccacaccct ccgcctccag gcctcagctt ctccagcagc ttcctaaacc 6560 

ctgggtgggc cgtgttccag cgctactgtc tcacctgtcc cactgtgtct tgtctcagcg 6720 

acgtagctcg cacggttcct cctcacatgg ggtgtctgtc tccttcccca acactcacat 6780 

gcgttgaagg gaggagattc tgcgcctccc agactggctc ctctgagcct gaacctggct 6840 

cgtggccccc gatgcaggtt cctggcgtcc ggctgcacgc tgacctccat ttccaggcgc 6900 

tccccgtctc ctgtcatctg ccggggccCg ccggtgtgtt ctcctgtttc tgtgctcctt 6960 

tccacgtcca gctgcgtgtg Cctctgcccg ctagggtctc ggggttttta taggcatagg 7020 

acgggggcgt ggtgggccag ggcgctcttg ggaaatgcaa catttgggtg tgaaagtagg 7080 

agtgcctgtc ctcacctagg tccacgggca caggcctggg gatggagccc ccgccaggga 7140 

cccgcccttc tctgcccagc actttcctgc ccccctccct ctggaacaca gagtggcagt 7200 
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ttccacaagc 
gccccacagc 
ccgacccccg 
tttaacaaac 
gaggaacatg 
tgccacctcc 
aggggagtgg 
ccttttacta 
cataggggag 
cctgggcagg 
aacccgcccg 
atccttcggg 
aggagggtca 



gggaccctcc 
atcgtggacc 
gtgtgtgggg 
aaaacaaagg 
ggcaggcacg 
gttatgctct 
cgtctcctgg 

gtcaagctga 
ctgggattac 
ggctcaagtc 

catattcaca 
gaggctgcag 
atcagggcgc 
gtagaaatta 
ccaggggcag 
cactgctggt 
ggtttcactc 
gcctctgcct 
caggcacccg 
ggggttcacc 
tctgcctcct 
tctgtttaga 

gatgactaag 
gtcttctggg 
ggtgttaatt 
tgtgttttct 
tggaacaaat 
ccctttaaaa 
agtatttaca 
ctgttcaaat 
aaggttacat 
atccctgcaa 
aacccggagt 
ggtctggagg 
agctctgcag 
tgcgggcggg 
gtcaaggccg 



actaagcatc 
cctgggaatt 
ctgttttatt 
tggttaaaca 

atgggatacg 
ttaggggggt 
aagccagttt 
tggggatggg 
ataatgctct 
gccccagggc 
actacctgca 
gaggggggca 
cctcgagccc 
acggagcctg 

atttgcagaa 
tttacagaaa 
agtgatttta 
tgttgcccag 
gttcaagcaa 
acacccggct 



aggcatgagc 
acacccactg 

gtttctgtga 
gcttcaggtc 
aagtgtggac 
aagtccatcc 
aggagttcct 
actgaatcca 
ttgttgctca 



cacgtgacta 
ttaatagcta 
aacgggtcca 
agcctgcagg 

taaggacggt 
cctggttctg 
ggaacccgga 
agagatgccc 
ctttgcaggt 
ggcccgaaaa 
gcctcaggac 
aggcctgcaa 
cagcaggaag 
gtgccatagg 
gcaacaggaa 
catccaagga 
tttagctatt 
gctggagtgc 
ttctcgtgcc 
aattttgtat 
ctgacctcag 
cactgcacct 
gtaaggagtt 
tctgtaattc 
ccacctgtta 
ccagtggggt 
actgtcctga 

ctcactcctg 
ctgtttcatt 
ggctggaggg 
agtgattctc 



I gati 



tgc. 

atgttggcca ggctggt 
aaagtgctgg 
aacatctggg 
tgatagaatt 
acatcatcag 

actccagcat 
atgttggctt 



iggt 



ttt 



aggcttaggg 
agacgaggct 
gctagctcca 
ttaaggttgc 
ggcctcggga 
ctggattcct 
ggaccagtgg 
tccgaggctt 
atgtgaccag 
ttgtggctgg 
:tcg 



gtggggacci 
gtcaaggag cccaagtcgc 
ggagcaatgc 
acgtccggca 
tgggtctccg ( 



ggtag 
tttttattgt 
cttttcaaag 
cttcattgaa 
aatcttctgc 
ctctgcagag 
gcccctttgc 
atcactaagg 

taaataaagc 
gtttgttagc 
gacccagaag 
gggaagtcct 
ccgtgtggct 
ggagccaggt 
atgttggcct 
tgtgaggcgc 
acgggaccgc 
ctcgccgcct 
ggggaagtgt 
gtcctcgggt 
ttcgtggtgc 



ctggggc 
cgcggcccag 
ccagtggatt 



ggcggggaag 
ggccgggctc 
gagggactgg 
cgcggacccc 



agcgctgcgt cctgctgcgc ■ 



igctggcgtc 
acccccgggt 
cgcgggcaca 



cgcacatcat 
caaagcaggg 
tccgcacggt 
catctcaagg 
gctcaaaaag 
gggggcggca 
atggtattgg 
ggctgtgcca 
acgtcctgat 
gtgatctccg 
gtaatccagg 
gatggaggca 
gcgcctccag 
gcacggctgg 
agggcactcg 

cagggctgaa 
ttattttatt 
agcggcatga 
tcagcctccc 
ttttagtaga 
gtgatccgcc 
ggcctattta 
catggagttc 
ttcgtagact 
tcccatggga 
tgccatctgc 
atctcaatgt 
actgggattg 
tggaggaagg 
tgttggtttg 
agtgcaatgg 
ctgcttccgc 
tttgtatttt 
gaacttctga 
gtgagccacc 
gaagctcacc 
tgttagaaca 
acacactaac 
tgccgggagg 

aaccagtgta 
cctagtggca 
ggatttctag 
gagcgtgaca 

atttcagtgt 
tttctcgccc 
cagctgtcct 
tctactgctg 
gcctggaccc 
catctgccag 
ccggtgcgcg 
cccggtgggt 
gagaacctgc 
tgcagggagg 
tcgtccccag 
ccggagcccg 
gcggccaaag 
cgggttaccc 
cctgcaccct 
ccgcccggag 
gacgcccagg 



gtacacactc 
aaatccctgc 
ggacagttcc 
gaattacgct 
aaagaatttc 
gctgggggct 
ctcagttatg 
tctttgccat 

tgaggaccct 
ggttctggga 
gtcagtctga 
aagctggaaa 
cccttagccc 
cgctgccctt 
tgtgaatcta 
gtgcctccgg 
tacttacttt 
tcttggctca 
aagtagctgg 
gatgggcttt 
cacctcagcc 

ggggatacac 
cccactgcag 
cagtagaaac 
ctcagtgtgt 

aatgatactt 
tttgttttgt 
cgcgatcttg 
ctcccatttg 
tagtagagac 
cctcagatga 
atgcccagct 
ccactcaagt 
ctcttgatgt 

tctcttccct 

gagacaattc 
aagagcgacc 
gcccagggag 
ggcagtttct 
ttgccgacct 

gcggttgtgc 
ggctggaagt 
cgaggctgcc 
acagagtgcc 
gccagcagga 
gattaacaga 



ctggacattt 
ccgtccacga 
taaaatgtcc 
tcacagtgaa 
gagtcaaaac 
accccatggc 
actgcacgca 
ggagactaac 
gcccgagtgt 
ctgtggacag 
gaggtctggg 
agaggcgggc 
ggctgaaaag 
aagcggggaa 
accagggccc 
ctagcatgaa 

gcaagggcag 
ctgagacaga 
ctgcaacctc 
gatttcaggc 
caccatgttg 
tcccaaagtg 
aacttccctg 
ttactcagga 
cgtctcttga 
gggcagctgg 
ctgatgtaga 
gctgaaacat 



tgttattttt 
tttgagaggc 
gcttactgca 
gctgggatta 
gggggtgggt 
tccacctgcc 
cagaatttac 
gttgtggtgt 
tttacactgt 
atactggggt 
ccatgcacat 



cactccggga 
ccgcgtctac 
acgccccgcg 
ggtcgccgca 
cacagcctag 
gggagcgcga 
cagctgcgct 
accgcgctcc 

cccagccccc 
tcgcggcgcg 
ggccaccccc 



taacttttgt 
acaaacacag 
tgtaatccta 
ggtgcgaggc 
gaaagtagga 
cagctacagc 
aaacttgagc 
cggggcccca 
cgggcctcct 

ggggcccagg 
gcgcctggct 
tttggggtgg 
gacgggcctg 
ggtcccgcgt 
gcgcctccgt 
:ggacctg 



cgc, 



:ctgti 
:cgattcga 
gcggcgcgcg 
gtcggggcca 
ccacgtggcg 
gcctcctccg 
tccgggccct 
agtttcaggc 
gcgatg 



7260 

7320 

7380 

7440 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 

9120 

9180 

9240 

9300 

9360 

9420 

9480 

9540 

9600 

9660 

9720 

9780 

9840 

9900 

9960 

10020 

10080 

10140 

10200 

10260 

10320 

10380 

10440 

10500 

10560 

10620 

10680 

10740 

10800 

10850 

10920 

10980 

11040 

11100 

11160 

11220 

11276 



<211> 104 
<212> DNA 

<213> Homo sapiens 
<400> 4 

gtgggcctcc ccggggtcgg cgtccggctg < 
catgcggaga gcagcgcagg cgactcaggg ( 
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<210> 5 

<211> 8616 

<212> DNA 

<213> Homo sapiens 



<400> 5 
gtgaggaggt 
aaaagggggc 
ttttcgctca 

cgaggccaga 
tggggagaag 

gggtgggagg 
agatttaatt 
gggaagctga 
ggtgaaaccc 
taatcccagc 
tgcagtgagc 



actgttctcc 
agaggacagc 
atggtgctgc 
cctccgctcc 
gaagtcccga 
cagacaagga 
aaaagtcata 
tgctaactcg 

agtcagataa 
gagagtttga 
aggtcacaat 
cacgtgcagg 
ggcgcggccc 

gagtgaggcg 
gggtgtccct 
tagggtgagt 
gtccctgggt 
cacgtgcagg 
ggcgctgtcc 
ccttggcgtt 
gcgccggttg 

ggactgcagg 
gcctctgttg 

ttgctggaga 
cccctcactt 
gcaacgcttg 
ccagtcgccc 
cctgctctga 
ttgggtctta 
gctttttctt 
cctctaagtg 
cactttcaag 

cttttaagta 
gtttatgttc 

ggagccttgc 
acatcctgtc 
aagcttctgt 
gggggatggg 
gcatgcacgt 



tcacaggtca 
agtagctgga 
gacagggtct 
tacctcggct 



ggtggccgtc 
aggcagagcc 
ggacgtcgag 
acttacgagg 
gcagtgaaca 
tgtctggaag 
aatttcaagg 
taagggtttt 
gtgtgttgac 
ggcaggtgga 
tatctgtact 
tacttgggag 
tgagattgtg 
aaaaagtgtt 
agcacagatc 
agatggctcc 
tgggccctgc 
agcccccttt 
tttcaccccc 
gggtgacctt 

gcggtgttta 
atcgaacggc 
gcgtcatgca 
gttctctgat 
ctgcccctgg 
gtgagtgagg 
ccgggtgtcc 
tgtagggtga 
tggtccccgg 
gtcacgtgca 
gaggcgcggc 



gagggcccag 
ctggtcctcc 
tggacacggt 
ttcaccttca 
gaggaggctg 
cacagacgct 
gtgggaatga 
gcaggtgcac 
ggccaggtgc 
tcacctgagg 



gctgaggcag 
ccattgtact 
cgttgattgt 
ctggtcccat 
acctgctgag 
cgtgtcccca 
tggctcccag 



gtgagtgagg 
ctgggtgtcc 
tgctcacttg 
cccattgcct 
gctcggctct 
ctctcgcctc 
ggcctggc 



tgtcl 



itgc 



catcccagaa 
gtcctgtttt 
tcaccttatt 
cctcacatgg 
gacccacgtg 
gttttgaatt 
ggtttattct 

tgttcttaaa 
tttctttgtg 
ttctttagct 
aagatatgta 
ctaactcagt 
tttgtgatct 
aatagtgggc 
ctccttctag 
tctgttcatt 
ggtagaattt 

attttcctgc 
ttttgagaca 
gtgtaacttt 

tgctgtgttg 
tcccaaagtg 



cttggggctc 
ttggcactcc 
cagcaggttg 
agctgcctca 
acccagtttt 
caggactctg 
cttatgcagg 
cgttgccccc 
ctgtcccgtg 
gtgaggcgcc 
gtgtccctgt 
gggtgagtga 
cccagggtgt 
ggtatagggt 
cgcggccccc 
ctgtctcgtg 
agcttgctcc 
gggtagatgg 
tcttggtcac 
ccgcgtgcca 

cgaggctgga 
agggttctct 
ctcccaagct 
ctgggcacct 
attgacgtcc 
gagggccggt 

ttcattcctt 
tgcaccctgt 
atacttcaaa 
cacgctgtgt 
tattctgtga 
gagtatcaag 
tgtgtagtgg 
agtgtgtgca 

atgcatgaaa 
tcttctcgtt 



tgtgtctgtt 
gagtcttggt 
taccttctgg 

cccaggctgg 

agtgtgggta 
tgtttaactg 
actctgattc 
ggttgccatg 
gggcatttgc 



gccccagagc 
tgtctccatc 
gatctctgcc 
cgttttgatg 
ggcgcggcag 
ctggcgaggg 
gaggtgggga 
gtggtcagcc 
ggtggctcac 
tcaggagttt 
aaattagctg 
gagaatcact 
ccagcctggg 
gccaggacag 
ctttaggtat 
gaagggacag 
ccctgttttt 
tgctcccagg 
ctcccaagac 
ttttttttct 
taacaccgtt 
cttgaaatgc 
cacctgctgc 
gctttttgtg 
cctgtcattg 
gagtgaggcg 
aggtgtccct 
cagcgtgatt 
atccccgggt 
cacgtgcagg 
ggcgcggtcc 
ccctgtcacg 
gagtgaggca 
gggtgtccct 
tagggtgagt 
tgaatgtttg 
tgcaggcgca 
ctctccgttc 
ggcactgcag 
tgcccgccac 
ctctgggctg 
gtgccctgaa 



gcc. 



gccgctcatt 
agccacaggt 
gtctccgcca 
acctctgacg 
ttctagcttc 
gttttgatgt 
gtgttaatac 
tttgacgtga 
tttctttgag 
atacgtagag 



:tgta1 



tgaatgcagt 
gtcacgtggg 

gacacgcggt 
tggagccggg 
tgcctgcagg 
cgagaacccc 
aatatgcagg 
gccggtaatc 
gagaccagcc 
ggcatggtgg 
tgaacccagg 
cgacaagagt 
ggtagaggga 
gaagagggcc 
tgtttgtggg 
ctggatttga 
ccctaccgtg 
atgtaagact 
ttttttcttt 
ttctgtgtac 
tgcgtcttgc 
ggctcaggtg 

ctgttctctg 
tggtccccgg 
gtcacgtgta 
gaggtgtggc 
gtccctgtca 
gtgagtgagg 
ccgggtgtcc 
tgtagggtga 
ctgtccccgg 
ctcaggtgca 
gaggctctgt 
ctctttctat 
gtgctggtcc 
cattttgcta 
ccacagcttc 
atgcatgctg 
cctgtgtctg 
ggaaagcaag 
ttggccccct 
gcttaggctg 
tggagtgtct 
gccttcgtca 
tttctatctc 
ttagtttagt 
gaagtaatct 

aatcattttg 
cagtgagtta 



aggggctcag 
cacacgtggc 
ctcctgtcca 
ttccaggcgc 
ttgccggcaa 

ctcttcctgg 
tttgtgttta 
ccagcacttt 
tgaccaacat 
tgtgtgcctg 
aggcggaggc 
gaaactctgt 
gggagataag 
acatgggagc 
tgttcagggg 
tgttgaggaa 
gcagctagaa 
tccggccatg 
ttatggtggc 
agtgcagaat 
gtgactggaa 
gaccacgccg 
cttcgttgag 
acttcagatg 
gtgtccctgt 
gggtgagtga 
ccccgggtgt 
cgtgtagggt 
cactgtcccc 
ctctcaggtg 
gtgaggcacc 
gtgtccctgt 
gggtgagtga 
ccccaggtgt 
agccacagct 



ctgccacgtg 



tggti 



ccag , 



ttccaagaag 
tggtagcatt 
gatgagtgaa 



ctgtccat 

gaggccatag 
tatgtgaggc 
tcttttggag 



ttctgccttt aatttatata 
ctgtcgccca gggtgagtgc 
cctgagccgt cctctcacct 
cacctggcta atttttaaat 



ggcatgagcc 
gataacctga 
tgcgtttcct 



tgggtgggtg 
ggctctgcct 
ctgtctgtct 
gacttccctc 
tccattgtat 
catgcctttc 
caacatcagc 



tgaagtttgc 
tgtaaatttg 
gtccagtgca 

attgttaggt 
acttctatgt 
ctgggcttct 

agtggtgtga 
cagcctcctg 
tttttctgga 
gggatccatc 



gtgaat 
ctcgtt 



ittattgaga 



tgttgcatgt 

gccgagtgtg tgtt( 

ctttgcttag tgtt< 

cagtctcact ctgt( 



12 0 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
• 2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3350 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
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ggctggagtg taatggcaca atctcggctc actgcaacct ctgcctcctc ggttcaagca 4380 
gttctcattc ctcaacctca tgagtagctg ggattacagg cgcccaccac cacgcctggc 4440 
taatttttgt atttttagta gagataggct ttcaccatgt tggccaggct ggtctcaaac 4500 
tcctgacctc aagtgatctg cccgccttgg cctcccacag tgctgggatt acaggtgcaa 4550 
gccaccgtgc ccggcatacc ttgatctttt aaaatgaagt ctgaaacatt gctacccttg 4620 
tcctgagcaa taagaccctt agtgtatttt agctctggcc accccccagc ctgtgtgctg 4680 
ttttccctgc tgacttagtt ctatctcagg catcttgaca cccccacaag ctaagcatta 4740 
ttaatattgt tttccgtgtt gagtgtttct gtagctttgc ccccgccctg cttttcctcc 4800 
tttgttcccc gtctgtcttc tgtctcaggc ccgccgtctg gggtcccctt ccttgtcctt 4860 
tgcgtggttc ttctgtcttg ttattgctgg taaaccccag ctttacctgt gctggcctcc 4920 
atggcatcta gcgacgtccg gggacctctg cttatgatgc acagatgaag atgtggagac 4980 
tcacgaggag ggcggtcatc ttggcccgtg agtgtctgga gcaccacgtg gccagcgttc 5040 
cttagccagt gagtgacagc aacgtccgct cggcctgggt tcagcctgga aaaccccagg 5100 
catgtcgggg tctggtggct ccgcggtgtc gagtttgaaa tcgcgcaaac ctgcggtgtg 5160 
gcgccagctc tgacggtgct gcctggcggg ggagtgtctg cttcctccct tctgcttggg 5220 
aaccaggaca aaggatgagg ctccgagccg ttgtcgccca acaggagcat gacgtgagcc 52 8 0 
atgtggataa ttttaaaatt tctaggctgg gcgcggtggc tcacgcctgt aatcccagca 5340 
ctttgggagg ccaaggcggg tggatcacga ggtcaggagg tcgagaccat cctggccaac 5400 
atgatgaaac cccatctgta ctaaaaacac aaaaattagc tgggcgtggt ggcgggtgcc 5460 
tgtaatccca gctactcggg aggctgaggc aggagaattg cttgaacctg ggagttggaa 5520 
gttgcagtga gccgacattg caccactgca ctccagcctg gcaacacagc gagactctgt 5580 
ctcaaaaaaa aaaaaaaaaa aaaaaaaaaa aattctagta gccacattaa aaaagtaaaa 5640 
aagaaaaggt gaaattaatg taataataga ttttactgaa gcccagcatg tccacacctc 57 00 
atcattttag ggtgttattg gtgggagc-at cactcacagg acatttgaca ttttttgagc 57 60 
tttgtctgcg ggatcccgtg tgtaggtccc gtgcgtggcc atctcggcct ggacctgctg 5820 
ggcttcccat ggccatggct gttgtaccag atggtgcagg tccgggatga ggtcgccagg 5880 
ccctcagtga gctggatgtg cagtgtccgg atggtgcacg tctgggatga ggtcgccagg 5940 
ccctgctgtg agctggatgt gtggtgtctg gatggtgcag gtcaggggtg aggtctccag 6000 
gccctcggtg agctggaggt atggagtccg gatgatgcag gtccggggtg aggtcgccag 6060 
gccctgctgt gagctggatg tgtggtgtct ggatggtgca ggtcaggggt gaggtctcca 612 0 
ggccctcggt aagctggagg tatggagtcc ggatgatgca ggtccggggt gaggtcgcca 6180 
ggccctgctg tgagctggat gtgtggtgtc tggatggtgc aggtctgggg tgaggtcacc 6240 
aggccctgcg gtgagctggg tgtgcggtgt ctggatggtg caggtctgga gtgaggtcgc 6300 
cagacggtgc cagaccatgc ggtgagctgg atatgcggtg tccggatggt gcaggtctgg 63 60 
ggtgaggttg ccaggccctg ctgtgagttg gatgtggggt gtccggatgc tgcaggtccg 6420 
gtgtgaggtc accaggccct gctgtgagct ggatgtgtgg tgtctggatg gtgcaggtct 6480 
ggggtgaagg tcgccaggcc cctgcttgtg agctggatgt gtggtgtctg gatggtgcag 6540 
gtctggagtg aggtcgccag gccctcggtg agctggatgt gcagtgtcca gatggtgcag 6600 
gtccggggtg aggtcgccag accctgcggt gagctggatg tgcggtgtct ggatggtgca 66 60 
ggtctggagt gaggtcgcca ggccctcggt gagctggatg tatggagtcc ggatggtgcc 6720 
ggtccggggt gaggtcgcca gaccctgctg tgagctggat gtgcggtgtc tggatggtac 67 80 
aggtctggag tgaggtcgcc agaccctgct gtgagctgga tatgcggtgt ccggatggtg 6840 
caggtcaggg gtgaggtctc caggccctcg gtgagctgga ggtatggagt ccggatgatg 69 00 
caggtccggg gtgaggtcgc caggccctgc tgtgaactgg atgtgcggcg tctggatggt 6960 
gcaggtctgg ggtgtggtcg ccaggccctc ggtgagctgg aggtatggag tccggatgat 7 020 
gcaggtccgg ggtgaggtcg ccaggccctg ctgtgagctg gatgtgcggc gtctggatgg 7 080 
tgcaggtctg gggtgtggtc gccaggccct cggtgagctg gaggtatgga gtccggatga 7140 
tgcaggtccg gggtgaggtt gccaggccct gctgtgagct ggatgtgctg tatccggatg 7200 
gtgcagtccg gggtgaggtc gccaggccct gctgtgagct ggatgtgctg tatccggatg 7260 
gtgcaggtct ggggtgaggt caccaggccc tgcggtgagc tggttgtgcg gtgtccggtt 7320 
gctgcaggtc cggggtgagt tcgccaggcc ctcggtgagc tggatgtgcg gtgtccccgt 7380 
gtccggatgg tgcaggtcca gggtgaggtc gctaggccct tggtgggctg gatgtgccgt 7440 
gtccggatgg tgcaggtctg gggtgaggtc gccaggcctt tggtgagctg gatgtgcggt 7 500 
gtctgcatgg tgcaggtctg gggtgaggtc gccaggccct tggtgggctg gatgtgtggt 7560 
gtccggatgg tgcaggtccg gcgtgaggtc gccaggccct gctgtgagct ggatgtgcgg 7620 
tgtctggatg gtgcaggtcc ggggtgaggt agccaaggcc ttcggtgagc tggatgtggg 7 68 0 
gtgtccggat ggtgcaggtc cggggtgagg tcgccaggcc ctgcggttag ctggatatgc 774 0 
ggtgtccgga tggtgcaggt ccggggtgag gtcaccaggc cctgcggtta gctggatgtg 7800 
cggtgtctgg atggtgcagg tccggggtga ggtcgccagg ccctgctgtg agctggatgt 7860 
gctgtatccg gatggtgcag gtccggggtg aggtcgccag gccctgcagt gagctggatg 7920 
tgctgtatcc ggatggtgca ggtctggcgt gaggtcgcca ggccctgcgg ttagctggat 7980 
atgcggtgtc ggatggtgca ggtccggggt gaggtcacca ggccctgcgg ttagctggat 8040 
gtgcggtgtc cggatggtgc aggtctgggg tgaggtcgcc aggccctgct gtgagctgga 8100 
tgtgctgtat ccggatggtg caggtccggg gtgaggtcgc caggccctgc ggtgagctgg 8160 
atgtgctgta tccggatggt gcaggtctgg cgtgaggtcg ccaggccctg cggtgagctg 8220 
gatgtgcagt gtacggatgg tgcaggtccg gggtgaggtc gccaggccct gcggtgggct 8280 
gtatgtgtgt tgtctggatg gtgcaggtcc ggggtgagtt cgccaggccc tgcggtgagc 8340 
tggatgtgtg gtgtctggat gctgcaggtc cggggtgagt tcgccaggcc ctcggtgagc 8400 
tggatatgcg gtgtccccgt gtccgaatgg tgcaggtcca gggtgaggtc gccaggccct 8460 
tggtgggctg gatgtgccgt gtccggatgg tgcaggtctg gggtgaggtc gccaggccct 8520 
tggtgagctg gatgtgcggt gtccggatgg tgcaggtccg gggtgaggtc accaggccct 858 0 
cggtgatctg gatgtggcat gtccttctcg tttaag 8616 

<210> 6 

<211> 2089 

<212> DNA 

<213> Homo sapiens 
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<400> 6 

gtactgtatc cccacgccag gcctctgctt 
agcatgcgcc tgtctccact tgcctgtgct 
caggggcccc gtcacaggcc tggtccaagt 
gctcacgttc tcttacttgt aaaatcagga 
aagcagaagg gatttaaatt agatggaaac 
gatgtgggtc tgattctctc tctctttttt 
ttgcccaggc tggagtgcag tggcataatc 
ttaagcgatt caccagcctc agcctcctaa 
gcctggctaa tttttgtact tttaggagag 
ctcgaactca tgacctcagg tgatccaccc 
ggctaagcca ccgtgcccag cccccgattc 
ttcaatctat tggatttagg tcatgagagg 
cagggagcac ctgtgcaggg agcacctggg 
taggtggctg catttgaatg gctgtgagat 
tgagattgtg acagattcaa gctggatttg 
gagatgccag cctggctgag cccaggccat 
tgactgtgga gggctttagt cagaagatca 
tccctggggg gccttgtgac accccatgcc 
cagcagacct cgtcagaggt aacacagcct 
gccatttcct tgcatctggg ggagggtcag 
aatgcacctt acttagactt tacacgtatt 
accagtattt tggaaagaat ttaattgggg 
ccccaagatg ctccttgtca ctactgggac 
cctcctccct ggacagggta ccgtgccttt 
agggcaccag ctccggagca cccgcggccc 
acagatgccc aggtccaggt gtggccgctc 
gggaaaaggc caagggcaga ggtgtcagga 
tccttggctg agctgccctg agcagcctct 
ctttctacct gggggtcctg cctggggcca 
gggacaggca tcctgtgtgg aggggcatgg 
caggctccct ggtgctgatg gtgggacagt 
tccccagggt tgactatagg accaggtgtc 
aggcgtctgg ctggcatggg tggacgtggc 
tgggtgccct gagccctcac tgagtcggtg 
tagtctgttg tctggctgag caagcctcct 

<210> 7 

211> 687 

"<212> DNA 

213> Homo sapiens 

<400> 7 

gtggctgtgc tttggtttaa cttccttttt 

gtatcagctt agatgaaggg cccggaggag 

cggcgccaac ccatttgtgc gcacagtgag 

cagcgtgggg gtgtaggggg agctcctggg 

cagccgggcc agggcctgga tgcagcacgg 

gtgcgcagcc tccgtgcgct tccgcttacg 

agcccaccgg gctctgagga tcctggacct 

ggctgcggtg gctgcggtga ccccgtcatc 

gtgtggcatg aggatcccgt gtgcaacaca 

tctgaggaag ctgggagggg ttctaggtcc 

gggctgcttc tcccctgggt ccctatggtg 

gactgtctcc catgctgtcc ccgccag 

<210> 8 

<211> 494 

<212> DNA 

<213> Homo sapiens 

<400> 8 

gtgggtgccg gggacccccg tgagcagccc 
gcacctcatg ttgggtggag gaggtactcc 
tgtcactgtt gaggacacac ctggcaccta 
catggggccg actgtgcacc ctgactgccc 
gattccagtt tccgtcagag aaggaaccgc 
gcaccccagt cctgagccag gggtctcctg 
ccctgccctt ggggtctgga gtggtggggg 
caggccctga gggcagaggt gatgtctgag 

<210> 9 

<211> 865 

<212> DNA 

<213> Homo sapiens 



ctcgaagtcc tggaacacca gcccggcctc 6 0 

tccctggctg tgcagctctg ggctgggagc 12 0 

ggattctgtg caaggctctg actgcctgga 18 0 

gtttgtgcca agtggtctct agggtttgta 240 

actaccacta gcctccttgc ctttccctgg 300 

ttttcttttt tgagatggag tctcactctg 360 

ttggctcact gcaacctcca cctcctgggt 42 0 

gtagctggga ttacaggcac ctgccaccac 480 

acggggtttc accatgttgg ccaggctggt 540 

accttggcct cccaaagtgc tgggtttaca 600 

tcttttaatt catgctgttc tgtatgaatc 660 

ataaaatccc acccacttgg cgactcactg 720 

gataggagag ttccaccatg agctaacttc 780 

tttgtctgca atgttcggct gatgagagtg 840 

catcagtgag ggacgggagc gctggtctgg 900 

ggtattagct tctccgtgtc ccgcccaggc 960 

gggcttcccc agctcccctg cacactcgag 1020 

ccaaatcagg atgtctgcag agggagctgg 1080 

ctgggctggg gaccccgacg tggtgctggg 1140 

ggctttccct gtgggaacaa gttaatacac 1200 

taaCggtgtg cgacccaaca tggtcatttg 1260 

tgaccggaag gagcagacag acgtggtggt 1320 

tgttgttctg cctggggggc cttggaggcc 13 80 

tctactctgc tgggcctgcg gcctgcggtc 1440 

cagtgtccac ggagtgccag gctgtcagcc 15 00 

cagcccccgt gcccccatgg gtggttttgg 1560 

gactggtggg ctcatgagag ctgattctgc 1620 

cccgccctct ccatctgaag ggatgtggct 1680 

gccttgggct accccagtgg ctgtaccaga 1740 

gttcacgtgg ccccagatgc agcctgggac 18 00 

caccctgggg gttgaccgcc ggactgggcg 1860 

caggtgccct gcaagtagag gggctctcag 1920 

cccgggcatg gccttcagcg tgtgctgccg 1980 

ggggcttgtg gcttcccgtg agcttccccc 2040 

gaggggctct ctattgcag 2089 



aaacagaagt gcgtttgagc cccacatttg 60 

gggccacggg acacagccag ggccatggca 12 0 

gtggccgagg tgccggtgcc tccagaaaag 180 

gcagggacag gctctgagga ccacaagaag 240 

cccgaggtcc tggatccgtg tcctgctgtg 300 

gggcccgggg accaggccac gactgccagg 360 

tgccccacgg ctcctgcacc ccacccctgt 420 

tgaggagagt gtggggtgag gtggacagag 48 0 

catgcggcca ggaacccgtt tcaaacaggg 54 0 

cgggtctggg tggctgggga cactggggag 6 00 

gggtgggcac ttggccggat ccactttcct 660 
687 



tgctggacct tgggagtggc tgcctgattg 60 

tgggtgggcc gcagggagtg caggtgaccc 120 

gggtggaggc cttcagcctt tcctgcagca 180 

gggctcctat Ccccaaggag ggtcccactg 240 

aacggctcag ccaccaggcc ccggtgcctt 3 00 

tcccgaggct cagagagggg acacagcccg 3 50 

tcagagagag agtgggggac accgccaggc 420 

tttctgcgtg gccactgtca gtctcctcgc 480 
494 
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<400> 9 














gtaaggttca 


cgtgtgatag 


tcgtgtccag 


gatgtgtgtc 


tctgggatat 


gaatgtgtct 


60 


agaatgcagt 


cgtgtctgtg 


atgcgtttct 


gtggtggagg 




atttacacat 


120 


ctgtgatatg 


cgtgtgtggc 


acgtgtgtgt 


cgtggtgcat 


gtatctgtgg 


cgtgcatatt 


180 


tgtggtgtgt 


gtgtgtgtgg 


cacgtgtgtg 


tccatggtgt 


gtgtgcctgt 


ggtgtgcatg 


240 


























gggLgcgcgt 




ggccccttgg 


ccttactcct 


tcctcctcca 


ggcatggtcc 


gcaccattgt 




420 


tcgggtgctg 


gtttggggag 


ctccacattc 


agggtcctca 


cttctagcat 


gggtgcccct 


480 


gtcctgtcac 


agggctgggc 


cttggagact 


gtaagccagg 


tttgagagga 


gagtagggat 


540 
















tccatgagat 


ataggaaggc 


tgattcaggc 


ctcgctcccc 


gggacacact 


cctcccaqag 


660 




ccttggggct 


cggcaggggt 


gaaaggggcc 
































acaagg 




acctctttct 


ctgacttctt 


gagct 








865 


<210> 10 














<211> 3782 














<212> DNA 














<213> Homo 














<400> 10 




































gggnc ugcag 




















ggtgtggagg 


cctcccctgg 


gctccctgtt 


ctgtttcttc 


cactctgggg 


tcgtgtggtg 


240 


cctgctgtgg 


tgtgtggccg 


gtgggcaggg 


cttccaggcc 


tccttgtgtt 


cattggcctg 


300 


gatgtggccc 


tggctacgct 


ccgtccttgg 


aattcccctg 


cgagttggag 




360 


tttctttttt 


tctttctttt 


tttttttttt 


tgataacaga 


gtctcgctct 




420 


ggctggagtg 


gtttggcgtg 






gtgcttcctg 


agttcaagca 


480 


attctcttgc 


ctcagcctcc 


caagtagctg 


gaattatagg 


cgcccaccac 


catgctgact 


540 


aatttttgta 


attttagtag 


agacgaggtt 




ggccaggctg 


gtctcgaact 


600 


cctgacctca 


ggtgatcctc 


ccacctcggc 


ctcccaaagt 


gctgggatga 


caggtgtgaa 


660 


ccgccgcgcc 


cggccgagac 


tcgcttcctg 


cagcttccgt 


gagatctgca 


gcgatagctg 


720 


cctgcagcct 


tggtgctgac 


aacctccgtt 


ttccttctcc 


aggtctcgct 


aggggtcttt 


780 


ccatttcatg 




cagaagagtt 


tcacgtgtgc 


tgatttcccg 


gctgtttcct 


840 


gcgtaattgg 


tgtctgctgt 


ttatcgatgg 


cctccttcca 


tttcctttag 


gctttgttta 


900 


ttgttgtttt 


tccggctcct 


tgaaggaaaa 


gtttcgatta 


tggatgtttg 




960 



tctaaacaag catctgaagt tgccgttttc cctctaaagc agggatcccg aggcccctgg 1020 

ctgtggagtg gcaccggtct ggggcctgtt aggaacccgg cgcacagcgg gaggctaggt 1080 

ggggtgtggg gagccagcgt tcccgcctga gccccgcccc tctcagatca gcagtggcat 1140 

gcggtgctca gaggcgcaca caccctactg agaactgtgc gtgagagggg tctagattct 12 00 

gtgctcctta tgggaatcta atgcctgatg atctgaggtg gaaccgtttg ctcccaaaac 12 60 

catccccttc cccactgctg tcctgtggaa aaatcgtctt ccacgaaacc agtccctggt 1320 

accacaatgg ttggggaccc tgtgctaaag acctgcttca gcagcctctc gtcagtgttg 1380 

atatattggc ttttctgtgt tgagtccaga ataattacgg atttctgtga tgctttccgc 1440 

cgacctcaga cccatgggct atttgtgggc gtgttgcctg ctcctgggtt gggaagggtg 1500 

caggccccat gtaccttcct gttactgcct tccaggttgg ttctcagggt tgaatcgtac 1560 

tcgatgtggt tttagcccac ggccctgccg ccagctcctg ggggctgggg aacatgctga 1520 

agcacagagt caccgtgcgc gtcttttgat gcctcacaag ctcgaggcct cctgtgtccg 1680 

tgttagtgtg tgtcacgtgc ctgctcacat cctgtcttgg ggacgcaggg gcttagcagg 1740 

tcccgtagta aatgacaagc gtcctggggg agtctgcaga ataggaggtg ggggtgccgg 1800 

tctctctccc gcgtcttcag actcttctcc tgcctgtgct gtggctgcac ctgcatccct 1860 

gcaatccctc cagcactggg ctggagaggc ccgggagctc gagtgccact tgtgccacgt 192 0 

gactgtggat ggcagtcggt cacgggggtc tgatgtgtgg tgactgtgga tggcggttgg 1980 

tcacaggggt ctgatgtgtg gtgactgtgg atggcggtcg tggggtctga tgtggtgact 2040 

gtggatggcg gtcgtggggt ctgatgtgtg gtgactgtgg atggcggtcg tggggtctga 2100 

tgtggtgact gtggatggcg gtcgtggggt ctgatgtggt gactgtggat ggcggtcgtg 2160 

gggtctgatg tggtgactgt ggatggcagt cgtggggtct gatgtgtggt gactgtggat 2220 

ggcggtcgtg gggtctgatg tggtgactgt ggatggcagt cgtggggtct gatgtgtggt 2280 

gactgtggat ggcggtcgtg gggtctgatg tgtggtgact gtggatggcg gtcgtggggt 2340 

ctgatgtgtg gtgactgtgg atggcggtcg tggggtctga tgtgtggtga ctgtggatgg 2400 

cggtcgtggg gtctgatgtg gtgactgtgg atggcggtcg tggggtctga tgtgtggtga 2460 

ctgtggatgg tgatcggtca caggggtctg atgtgtggtg actgtggatg gcggtcgtgg 2520 

ggtctgatgt gtggtgactg tggatggtga tcggtcacag gggtctgatg tgtggtgact 2580 

gtggatggcg gtcgtggggt ctgatgtgtg gtgactgtgg atggcggttg gtcccggggg 2640 

tctgatgtgt ggtgactgtg gatggcgatc ggtcacaggg gtctgatgtg tggtgactgt 2700 

ggatggcggt cgtggggtct gatgtgtggt gactgtggat ggcggtcgtg gggtctgatg 2760 

tgtggtgact gtggatggcg gtcgtggggt ctgatgtggt gactgtggat ggcggtcgtg 2820 

gggtctgatg tggtgactgt ggatggcggt cgtggggtct gatgtgtggt gactgtggat 288 0 

ggcggttggt cccgggggtc tgatgtgtgg tgactgtgga tggcggtcgt ggggtctgat 2940 

gtggtgactg tggatggcag tcgtggggtc tgatgtgtgg tgactgtgga tggcggtcgt 3000 

ggggtctgat gtgtggtgac tgtggatggc ggtcgtgggg tctgatgtgt ggtgactgtg 3060 

gatggcggtc gtggggtctg atgtgtggtg actgtggatg gcggtcgtgg ggtctgatgt 3120 

ggtgactgtg gatggcggtc gtggggtctg atgtgtggtg actgtggatg gtgatcggtc 3180 

acaggggtct gatgtgtggt gactgtggat ggcggtcgtg gggtctgatg tgtggtgact 3240 
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tgtggtgact gtggatggcg gtcgtagggt 
gtcacagggg tctgatgtgt ggtgactgtg 
actgtggatg gcggtcgtgg ggtctgatgt 
tgatgtgtgg tgactgtgga tggcggtcgt 
tcggtcacag gggtctgatg tgtggtagct 
actttgcgtc ctcggccccc cggcccccgt 
tgggcttcat cccgccatcg ggcttggccg 
agtgcccagc tctggccggg gcaggccaca 
ag 

<210> 11 

<211> 980 

<212> DNA 

<213> Homo sapiens 

<400> 11 

gtctgggcac tgccctgcag ggttgggcac 

aatcactggg ctcatgaccg gacagactgt 

tgtgatgggg gcatgatgag ctgtgtgcct 

tgcgacagct gctgcattca ggcacctgct 

ccgcagtgcc tttgttcatg atttgctaaa 

caaaggaaag gtgtccccct cctttaggag 

agctggcccc tcagtgctgg gtctgaggcc 

cgggccgtgt ttgagccacg ccccgctgag 

ggccctgtgg ccctttgcag atgtggtctg 

cctgttagca cttgctcggc tctaggggac 

ctctgggcga atttccttgg ctcccagggt 

ccagaccctg tgcccggcag ctgggcagca 

cggtgggctg tgtgggtgtg agcccagctg 

gtgtcacaca ctctgcctaa gcccatgtgt 

gatggccctg cattccagcc cagccccgca 

cggagggtct tggccacgtg gtcctgcctg 
tctcccgtct gctttcgcag 

<210> 12 
<211> 2485 
<212> DNA 

<213> Homo sapiens 
<400> 12 

gtgagtcagg tggccaggtg ccattgccct 
tgctcacctc tctcctgccc cttccccact 
ttttctggcc cccgccccct ccggctcctg 
ggctcggctt gcggcagccg gagcggagca 
ggggtgtgga gttgctcctg cgtggaggac 
tgcgccgagc gtttgagcct gcagcttgtc 
cggctctcac acgcttgtat ctctctctcc 
tcctgtccct gtcgtgtgac ccccgcgagg 
cccatctgga aagtgcgggg ttgaccgtgt 
ccatggggca ggcggcctgg gagagctgcc 
cggtggtaga gccacagtgc ctggtgccac 
cacacctccc ggcaggcatc tgcctgcgac 
ggaggaaatt cgtgcacact caaggtcatc 
ggaggcctct ctctgggatc gtctccagcg 
tttatttaaa aatataacta ttaattattg 
attataatat ttattaaagt ataattagaa 
cacaaattgc acatggcagc agagtgaatt 
taagcggccc ccaggcccac agaattcgct 
cgggccCcct tcgtggtcgt gaattttatt 
gtggcagggc tttggggaat gtgaggtgat 
ggtgactgtg tctgtcctgt ccctaggaca 
tggtccagtt tggcctctga ataaaaacgt 
acagagagag tttcccatcc catgtgctca 
gggctggccg gactcctaga gttggtgcgt 
gcccatcact gtgatatctg caccagcaag 
ttttttgaga cggaacgtca ctgttgtctg 
actcactgca acctccgcct cccgggttcc 
gctgagatta caggcaccca ccccctgcgc 
gggtttttgc catgttggcc aggctggtct 
ctcggcctcc caaagtgcCg ggattacagg 
ctttttaagg tgaccaccCa tagcgcttcc 
gctgcaagcg tctcttagca acaggagtgg 
tcgcgtggca gccatgcctt ctgtgtgcac 
actgtttgtc tgaaaacgca cccttggcat 
gtcatgctga aactaggggc aaggttgtat 
catgagtctt tcaccgtgga caaattcctt 
tcattccggg tcaagtgtct ggttctgtga 
aaagaaaacc ttgatgattc agagcaagga 



ctgatgtgtg gtgactgtgg atggcagtcg 33 60 
gatggcggtc gtggggtctg atgtgtggtg 3420 
gtggtgactg tggatggcgg tcgtggggtc 3480 
ggggtctgat gtggtgactg tggatggtga 3540 
gcaggtggag tcccaggtgt gtctgtagct 3 6 00 
ttcccaaaca gaagcttccc aggcgctctc 3660 
caggtccaca cgtcctgatc ggaagaaaca 3720 
tttgtggctc atgccctctc ctctgccggc 3780 
3782 



ggactcccag cagtgggtcc tcccctgggc 60 
tggccctggg gggcagtggg gggaatgagc 120 
tggcgaaatc tgagctgggc catgccaggc 180 
cacgtttgac tgcgcggcct ctctccagtt 240 
tgtcttctct gccagttttg atcttgaggc 300 
ggcaggccat gtttgagccg tgtcctgccc 3 60 
aaaggaaacg tgtccccctt cttaggagga 42 0 
cgggcctctc agtgctgggt ctgtccacgt 480 
tccacgtggc cctgtggctc tttgcagatg 540 
agtcgtgtcc accgcatgag gctcagagac 600 
gggggtggag gtggcctggg ctgctgggac 660 
actcctggat cacatatgcc atccgggcca 720 
gacccacagg tggcccagag gagacgttct 780 
gtctgcagag actcggcccg gccagcccac 840 
cttcatcaca aacactgacc ccaaaaggga 900 
tctcagcacc caccggctca ctcccatgtg 960 
980 



gcgggtggct gggcgggctg gcagggcttc 60 
gnccttctgc ccggggccac cagagtctcc 12 0 
ggctgcaggc tcccgaggcc ccggaaacat 180 
ggtgccacac gaggcctgga aatggcaagc 240 
gaggggcggg gggtgtgtct gggtcaggtg 300 
agctccaagt tactactgac gctggacacc 360 
cgatacaaaa ggattttatc cgattctcat 42 0 
gcgcgggctc ttctctctgt gactagattt 480 
agtttgctcc tctcgggggg cctgtggtgg 540 
gtcacacagc cactgggtga gccacactca 600 
atcacgtcct ctggatttta agtaaaacca 660 
cctgtgtgtg cctggggaga gtggtagcac 720 
agcaaggtca tccgcagtca ggtggaacgt 780 
gataaaggac tgtgcacagc ttcggaagct 840 
cattataagt aatcactaat ggtatcagca 900 
atattaagta gtacacacgt tctggaaaaa 960 
ttggccgagg gacacgtgtg cacatgtgtg 1020 
gacaaagtca cctccccaga gaagccacca 1080 
aagatggatc aagtcacgta ccgtccacgt 1140 
gactgcgtcc tcatgccctg acagacagga 1200 
cggacaggcc cgaagctcta gtccccatcg 1260 
cttcaaaacc tgttgcccca aaaactaaga 1320 
caggggcgta tctgcttgcg ttgactcgct 1380 
gtgcttctgt gcaaaaagtg cagtcctctt 1440 
gaaagcctct tttcttttct ttcttttttt 1500 
cctgggcttg agtgcagtgg cgcgatctca 1560 
agcatttctc ctgcctcagc ctcccgagca 1620 
ctggctaatt tttgtatttt tagtagagag 1680 
cgaactcctg acctcaggtg atccacccac 1740 
tgtgagccat cacgcccagc cggaaagcct 1800 
cgaaaataac aggtcttgtt tttgcagtag 1860 
cgtcctgtgg gctctgggga tggctgaggg 192 0 
ctttaggttc cacggggcta ttctgctctc 1980 
ccttgtttgg agagtttctg cttctcgttg 2040 
ccgttggcgc gcagcggcta catgtagggt 2100 
gaaaaaaaaa aaaggagtcc ggttaagcat 2160 
ataaactcta agatttaaga aaccttaatg 2220 
tgtggtcaca cctgtggctg gatctgtttc 2280 
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ttcctggtgc 

caggctctgt 
gccctgcatg 
acccaggttc 
tcccagcagg 
cacttgcatg 
ggttcaactc 
tctagggtac 
tgtgctgcac 

ttctcattgt 
gcaatagttt 
actcatcctt 
tccagtctat 

atatacccag 
atcaccacac 
gtgttctggt 
gaacaagcag 
tctcgaagac 
gtacagtgga 
gtaacagaaa 
ctcgagctgg 
ggggcgctgg 
ggatctgggt 
tggctcagag 
ctaagagtct 
aattgcacaa 
gttaactgta 
gagaatgtta 
atactacgta 



ggggggcttg 
tctggagacc 
atgaagccgc 
ccagcggcca 
atgagcatgt 
cgttagggtc 
ccctcgacag 
gggtctacac 
agctggcttt 
atgtgcacga 
ccattaactc 
atcccatgac 
tcagttccca 
gctcagagtg 
ttttatgact 
catcgatgga 
catacgtgtg 
taatgggatg 
tgtcttccac 
gctggagagg 
acagttagtg 
tccgggtttt 
tcaaggttct 



tgatgcattc 
aaccagacaa 
tgaggactgc 
cagcaggcgg 
acaccagctt 
gcagcctgag 
cttcagccca 
atatcgtgcc 



agccgcccca gtgcatggtg agagtgggga gcagggattg tttgttcaga ggtctcatct 2340 

ggtatgtttc tgaggtgttt gccggctgaa tggtagacgt gtcgtttgtg tgtatgaggt 2400 

tctgtgtctg tgtgtggctc ggtttgagtg tacgcatgtc cagcacatgc cctgcccgtc 2460 

tctcacctgt gtcttcccgc cccag 2485 

<210> 13 

<211> 1984 

<212> DNA 

<213> Homo sapiens 

<400> 13 
gtgaggcctc 
agtgttaata 
ggttgcagcc 
gggcCccacg 
gagggccgct 
ctgtcacgtc 
gccccacatc 
ttgcccatcc 
aacctaatgt 

gccatgttgg 

tgtccaagtg 
ttctttcctt 
aaggacatga 
attttcttaa 
gtgaatagtg 
tcctttgggt 
tccttgagga 
cagtgtaaaa 

cagccatttc 
attatagcat 
ataagtttat 
ggaaagtgtc 

cctgtggata 
gggcaccacg 
ttatttttcc 
gctggcacag 
tgctgtagca 
aaccgctttg 
tcgtagacag 

<210> 14 

<211> 1871 

<212> DKA 

<213> Homo sapi< 



ggtgggggtt 
atgactgctc 
acgggagggg 
tgtccagagg 

cttggggaga 
gtggcctgga 
ccaaggacgc 
tattgacagc 
cgtgcaggtt 

aggccctggt 
cctgtgagtg 
atggtttcca 
gcatagtatt 
catttgggtt 
catgtgtctt 
gctgggtcaa 
aatggttgaa 
atgtggacag 
aaggatgcgt 
tcctgtgcat 



cggcacactg 
gcttgggcct 
ctcggatcat 
ggggcgaggt 
gagaagtggg 
gctgatggta 
gagagctcgt 
ctttatttat 
aaaagtgtaa 



gtcagccctc 
gagggtcaca 
gctgaggacc 
tcccagcccc 
gccgcgcctg 

ctgttggaaa 
ggctgtgtaa 
agttaacctt 



gatttgcttt 
tgtcttgagg 
ttgcacagcc 
cctcagggct 
ccgaggaagc 
tggggctggt 
ctgggcgcct 
acacacctaa 
agttactttt 
agttacatat 
ttaggtatat 
gtgtgatgtt 
agaacatgtg 
gcttcgtcca 
ccgtggtgta 
ggttgcaagt 
tatagcagca 

ctagtttaca 
cagttatttt 
caggaagcct 
cttttgaaac 
ggttcaagtt 
aacttgctct 
tgggacagga 
cagtgcacca 
acagctgcca 

atggccttcg 
acttataatg 
gaaatttaag 
attgtttgac 
gctgtgtatt 



gtgtttggtt 
tgtccctaca 
tatgtgccac 

tgatttataa 
tagttctaga 
ctcccaccaa 
tttatgaaaa 
gcaggccaca 
tctagctcca 
ctagattgaa 
gggatttgga 
tacctctggc 
tgcccagctt 
tgctggtaaa 
ccgtcttcag 
ttcgtcttca 
aatgaggaat 



120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 



<400> 14 

gtgaggcccg 

cccccgtgtc 

gtgctggatc 

caccttcggg 

acacacgtgg 

ggcggctcct 

ggcggcagcc 

gctggcccac 

agaatttgga 

cagagttgat 

gggattgtcc 

agtgcgattt 

Cggccgccag 

gcctggcgtt 

gctgagggga 
ggccagggag 
gatgagtcgg 
ggagctgcgc 
ggggccacag 
gggcacaggg 
cttgacgtga 



tgccgtgtgt 
ctgcccctgg 
cgcaagagca 
agggagtggg 
tgagtgcagg 
ggggccccag 
tcctccccag 
agcgttcgct 
tttgctgagt 
ttttgtgaat 
aatgtggtcc 
gacgagggac 
gggtggtttc 

ccttgtgccg 



aggagaaaac 



agctgacgac 
aggcaaagtc 



ctgtggggac 
caccgcagcg 
gaggcgcttg 
taccgtgcag 
cggtgacctg 
tgagaccccc 
ggtgcacctg 
gcggtcacgt 
gctgctgtct 

ccctcaaggg 
gagaaacctt 
aggtgctttg 
tccaggtcca 
cagcccggag 
ccaggtccca 
ggggaacgct 
gtgtatgttg 
acaggaaggg 
ggtcccaggg 
aggaagggaa 
agctgggtga 
tggtgtitgcc 



ctccacagcc 
ttgtctctgc 
gccgtgcacc 
gccctggtcc 
gctcctgctg 
aggagctgtg 
agcctgcgga 



gcttctgtgt 

gtggccacag 
ccaggccaca 

gcgaggctca 
cagctcacag 
taaagcacag 
: gtcttaaaag 



tgtgggcttt 

caggcctggg 
tgcagagacg 
ctctttggaa 
cacagggccc 
gagcaggagc 
ggttgtttgg 
agatggctag 
gggacctggc 
agccggtggg 
aagggaaccc 
tttgtgaaaa 
ccgccctggg 
tgtgcacatt 
ggctcaggag 
ggcaagCtcc 
gggggcagaa 
ggagctggga 
ggaagggcag 
aggccagagc 
tgactcggcg 
cccagccagg 
cagatgcctt 
aaggtgggat 



gcagttgagc 
ctctctgccg 
ggcgcagggg 
cacccaggtt 
agtcaagagt 
gcagggccga 
tgctgagtga 
gatcggtggg 
gagtgggttt 
ctcagcacag 
cttgttttaa 

cccatttgga 
ctgggggtat 

tcctgaggct 
tgagggtgct 

atgcaccagg 
ggggacgccc 
agaggctacc 
agggaacctc 
tcccgcgcct 
cagggcatct 
ggtggcaatt 
attgtggtgt 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 



Le A 32 805-Foreign Countries 



tgccatgggg acacatgaga tggaccatca cagaggccac tggggctgca cctcccatct 1620 
gagtcctggc tgtcccgggt ccaggccagg ttcttgcatg ctcacctacc tgtcctgccc 1680 
gggagacagg gaaagcaccc cgaagtctgg agcagggctg ggtccaggct cctcagagct 174 0 
cctgccaggc ccagcaccct gctccaaatc accacttctc tggggttttc caaagcattt 1800 
aacaagggtg tcaggttacc tcctgggtga cggccccgca tcctggggct gacattgccc 1860 



<210> 15 
<211> 3801 
<212> DNA 
<213> Homo sapif 



<400> 15 

gtgagcgcac 

ccgttgcgtc 

ggccacaggg 

tgagggtgct 

cagacctggg 

gggccctgct 

cagggccctt 
agtctacagg 
tgtggggggg 
gccccgtctc 
ctgtttcttt 



ctggccggaa 

tgcccctcgt 
cacaacggga 
tgcactgagg 
gggcgtgagt 
cgagaccctg 
ttgggcgtga 
atgccatgag 

aggctcagac 
tatgaataaa 
cactttggga 
caacatagtg 
cacgcctgta 
gcagaggttg 



gtggagcctg 
tccgtgtggg 
cccatctggg 
gcagttttct 
tgtcttcaga 
ctctcaaacc 
gggccctgct 
gtctctccgc 
ttcatgatca 
aattctgggg 
acaaatgaat 
aagtatcaac 
ggccgaggtg 



cctggtggca 
aacccaggag 
acagagtgag 
ggacaggtgt ttttttati 
gaactggggg tgccttcd 
tggttgttaa accagaggi 
tggactttgc 
tggacaccct 
ggtgcagaca 



tgcccggctg 
gcaggcgact 
gctgagcaga 
gtgctatttt 
aagcagtctg 
cgaacacagg 
gggcgtgagt 
tgtgagcccc 
cgtgtgaccc 
tcttgtttcc 
tgaagatgga 
attccaggca 
ggtggatcac 

tgcgggaggc 



:cggt 
ctggcctcca 

cttattttgc 
gcacagcatc 

catcagatgt 
gtgtgcttgc 
gggtgaactc 
gaagaaaaca 
ttcttgtcca 
gatggacaga 
gtatgtggca 
gactggaagc 
aggagacaca 
aggtgaacgt 
tgaggcaacg 
atggccattc 

aggggagcag 

agggtgagcc 
gatgcacact 

atgctggctc 
acttttctgg 

ccaggcaggg 



cgtgatgggg 
cccttgtgca 

ctggctttgt 
aggcacctct 
tccccatgaa 
agtgaatgtt 
aacacattgc 
gggtccaatg 
agaggtggct 
acatcctctg 
ggcaaaatga 
gattttagtc 
acaatagaac 
cagctgatgg 
aaataagttg 

tccctggttt 
ggcattgctt 
cctggagcgt 
ctctcacttt 
ccgcccttgg 
ggggcccttg 
ggagcccaag 
tgggaaggtc 
cctcaaagaa 
cttttctggg 
aaagcagctt 
tctgaagtga 
ggacttgcca 



gttt. 



atgc; 



atggc 



gggcagtgag 
ctggagacac 



acaaaacgtt 
gagcctgccg 
tggtggtgag 
catgtgtgcc 
ccctgagatt 
cctgagtcac 
atgtgttttt 
cggccgtgcg 
atgcagagtc 
gggctgcagc 
gagcaagctt 
ctgtgtctca 



tgtccttcga 
tgaaaggcac 
taaactgggg 
aatgctccct 
gagcagcagg 
tggtgcccag 

ctgcatgatt 
gcagtgctgg 
atgtattttt 
attgaaggac 
aaagccacag 
ccagaatatt 
ctaaaagctc 
tgtctgaagt 
ttaagaaaag 
tcccaaacca 
aaaacggaag 
aaaagagagt 
tgtctttaca 
accagcaaca 
ggtgttgggg 
tcactgcaga 
ttgtgcacgt 
gttctcctaa 
tcacccagct 
ctctgcccga 
gtcgtgttgg 
ctaccagcag 
acgcacgtga 
cttgccaaga 
gtttgcatgg 

cagcaagtca 
gctctgccat 

tgaatgtcat 



acctgtgttc 
ggctgagtta 
aggtttggat 
tggatggcat 
gcatgcccca 
Cgcaggaggg 



aaaaaagtat 
taatatttac 
accttcatgg 

ggggtttgct 
tgcagacgcc 
catgtccctg 

ttc 
agt 
taggacaggc 
aaaggacaga 
aggctagtgc 
ctgtgctccc 
agcagtggag 
atacagcaga 
tgaaaaagga 
cagctcagat 
ccctatctct 
gtgtgtgtaa 
gcatatacca 

aaggacacac 
gaaactcagc 

ggcaaagggc 
ggaccccaca 
ggatggctgt 

aactgatggc 
gccagcatca 



tagcagtgtt 
gtgtgttcat 
acatcggtgg 
actggagccc 

actcgaggga 
tgtgcagatc 



gggcaggtgc 
gccaatccca 

ggtaaaagga 
gatccgaacc 
ggccctgctg 

atcaggggac 
ccagagcccg 
cacagatgca 
gggcaaggtg 
ttgaggccag 
aaatacaaaa 
tgaggcagga 
actgcactcc 
cagcattcca 
tggtgctgtg 
gaagagaaat 
ctgagttaac 
tcatggggga 
ctcatgatgg 
ttgcagctcc 
tctcacagcc 
ctgggctccc 
cagctgtgaa 
acccctggtt 



gtagcatttg 
ggcaggacaa 
ggctgggtgt 



aggatgggtg 
aaaggccact 
gcagtggttc 
ggcttgaagg 
aaagtggtaa 
ggtagaatgt 
cagaaacgtg 
tttttttttc 
gagcagattc 
aaaagactca 
agggaggcgg 
ttgcctgagc 
aggcgccctg 
ggtagaggag 
atgcatgatt 
caagtcagac 
gaaagaagaa 
atgcatgtga 
gagacctgtc 
ggttgaggca 
aatgtcctgt 

caaatacagg 
tcaaagaatt 
caaagctgga 

tgtttagctg 

cgcccgggag 
tcatcagggc 

gagtccatgg 
ggaagcggga 
ggggcaggca 



1871 



tgctgcaggg 50 
aagggtcaga 12 0 
tctgtgggag 18 0 
aatggtgcac 240 
caagacgccc 300 
ggcatgagtc 360 
ccagagactt 420 
gctcatccac 480 
agggccatgg 540 
agagctcaag 600 
gaaatctgtg 660 
gctcacacct 720 
gagtttgagg 78 0 
attagcctgg 840 
gaatcatttg 900 
agcctgggca ^6 0 
aaaccatagt 1020 
ctagaggccg 1080 
aagtggtgaa 1140 
agtccagatc 12 00 
gcagcaggtg 12 60 
gggagtggca 1320 
ctccccacaa 1380 
ttacctggtc 1440 
agcacctctt 1500 
ctgtccactg 1560 
ccagcctctg 1620 
aggaaaatgg 16 8 0 
ggcatcaggt 1740 
tggtcagagt 1800 
gccatactca 1860 
gcatctggga 1920 
gatgggaatt 1980 
ggtcagaact 2040 
tgttaatgtg 2100 
tgagaaaact 2160 
taggtagaag 2220 
aagggaaggg 228 0 
atgaaaccag 2340 
cacagtgaaa 2400 
tgaggtcctg 2460 
gaaaggctcc 2 52 0 
gcagcctggc 2580 
ccataggctc 2640 
atggacgtct 2700 
aactgacagc 2760 



:atcc 



; 2820 



agctggaaag 2880 
gtcttcccag 2940 
ttccagtgtt 3000 
gctaaggaga 3060 
tttgaagaat 3120 
tgtaaaagaa 3180 
ggacatacat 3240 
tcctgcccct 3300 
gtgccacctg 3360 
cagtgttctc 3420 
ccagggctcc 3480 
agatgatgag 3540 
ggtcctggtg 3600 
agtgagcacc 3660 
ggaaggcagg 3 72 0 
cctgtgtctg 3780 
3801 



Le A 32 805-Foreigii Countries 



- 72 - 



<210> 16 

<211> 880 

<212> DNA 

<213> Homo sapiens 

<400> 16 

gtgagcaggc tgatggtcag cacagagttc agagttcagg aggtgtgtgc gcaagtatgt 50 
gtgtgtgtgt gtgcgcgcgt gcctgcaagg ctgatggtga ctggctgcac gtaagagtgc 12 0 
acatgtacgc atatacacgt gagcacatac atgtgtgcat gtgtgtacat gaaggcatgg 18 0 
cagtgtgtgc acaggtgtgc aagggcacaa gtgtgtgcac atgcgaatgc acacctgaca 240 
tgcatgtgtg ttcgtgcaca gtcgtgtggg cattcacgtg aggtgcatgc gtgtgggtgt 3 00 
gcagtgtgag tagcatgtgt gcacataaca tgtattgagg ggtcctcgtg ttcaccccgc 360 
taggtcctca gcaccagtgc cactccttac aggatgagac ggggtcccag gccttggtgg 42 0 
gctgaggctc tgaagctgca gccctgaggg cattgtccca tctgggcatc cgcgtccact 480 
ccctctcctg tgggcttctg tgtccactcc ccctctcctg tgggcattta catccactcc 540 
actccctctc tcctgtgggc atccgcgtcc actccccctc tctgtgggca tctgcgtcca 600 
cctcccctct ctgtgggcat ttgcgtccac tccctctcct ggttccttcc tgtcttggcc 660 
gagcctcggg ggcaggcaga tgacacagag tcttgactcg cccagggtgg ttcgcagctg 720 
ccgggtgagg gccaggccgg atttcactgg gaagagggat agtttcttgt caaaatgttc 780 
ctctttcttg ttccatctga atggatgata aagcaaaaag taaaaactta aaatcccaga 840 
gaggtttcta ccgtttctca ctctttcttg gcgactctag 880 

<210> 17 

<211> 3186 

<212> DNA 

<213> Homo sapiens 

<400> 17 

gtgagccgcc accaaggggt gcaggcccag cctccaggga ccctccgcgc tctgctcacc 60 
tctgacccgg ggcttcacct tggaactcct gggttttagg ggcaaggaat gtcttacgtt 120 
ttcagtggtg ctgctgcctg tgcacagttc tgttcgcgtg gctctgtgca aagcacctgt 180 
tctccatctc tgggtagtgg taggagccgg tgtggcccca ggtgtcccca ctgtgcctgt 240 
gcactggccg tgggacgtca tggaggccat cccagggcag caggggcatg gggtaaagag 300 
atgtttatgg ggagtcttag cagaggaggc tgggaaggtg tctgaacagt agatgggaga 360 
tcagatgccc ggaggatttg gggtctcagc aaagagggcc gaggtgggtg caggtgaggg 420 
tcgctggccc cacccccggg aaggtgcagc agagctgtgg ctccccacac agcccggcca 48 0 
gcacctgtgc tctgggcatg gctgtgctcc tggaacgttc cctgtcctgg ctggtcaggg 540 
ggtgcccctg ccaagaatcg acaactttat cacagaggga agggccaatc tgtggaggcc 500 
acagggccag cttctgcctg gagtcagggc aggtggtggc acaagcctcg gggctgtacc 66 0 
aaagggcagt cgggcaccac aggcccgggc ctccacctca acaggcctcc cgagccactg 720 
ggagctgaat gccaggaggc cgaagccctc gccccatgag ggctgagaag gagtgtgagc 78 0 
atttgtgtta cccagggccg aggctgcgcg aattaccgtg cacacttgat gtgaaatgag 840 
gtcgtcgtct atcgtggaaa cccagcaagg gctcacggga gagttttcca ttacaaggtc 900 
gtaccatgaa aatggttttt aacccgagtg cttgcgcctt catgctctgg cagggagggc 960 
agagccacag ctgcatgtta ccgcctttgc accagctcca gaggcttggg accaggctgt 1020 
ctcagttcca gggtgcgtcc ggctcagacc gccctcctct ctgccttctc tctctgcctc 1080 
aaatcttccc tcgtttgcat ctccctgacg cgtgcctggg ccctcgtgca agctgcttga 1140 
ctcctttccg gaaacccttg gggtgtgctg gatacaggtg ccactgagga ctggaggtgt 1200 
ctgacactgt ggttgacccc agggtccagc tggcgtgctt ggggcctcct tgggccatga 1260 
tgaggtcaga ggagttttcc caggtgaaaa ctcctgggaa actcccaggg ccatgtgacc 1320 
tgccacctgc tcctcccata ttcagctcag tcttgtcctc atttccccac cagggtctct 1380 
agctccgagg agctcccgta gagggcctgg gctcagggca gggcggctga gtttccccac 1440 
ccatgtgggg acccttgggt agtcgcttga ttgggtagcc ctgaggaggc cgagatgcga 1500 
tgggccacgg gccgtttcca aacacagagt caggcacgtg gaaggcccag gaatcccctt 15 60 
ccctcgaggc aggagtggga gaacggagag ctgggccccg atttcacggc agccaggctg 1620 
cagtgggcga ggctgtggtg gtccacgtgg cgctgggggc ggggtctgat tcaaatccgc 16 80 
tggggctcgg ccttcctggc ccgtgctggc cgcgcctcca cacgggcttg gggtggacgc 1740 
cccgacctct agcaggtggc tatttctccc tttggaagag agcccctcac ccatgctagg 1800 
tgtttccctc ctgggtcagg agcgtggccg tgtggcaacc ccgggacctt aggcttattt 1860 
atttgtttaa aaacattctg ggcctggctt ccgttgttgc taaatgggga aaagacatcc 1920 
cacctcagca gagttactga gaggctgaaa ccggggtgct ggcttgactg gtgtgatctc 1980 
aggtcattcc agaagtggct caggaagtca gtgagaccag gtacatgggg ggctcaggca 2040 
gtgggtgaga tgaggtacac ggggggctca ggcagtgggt gaggccaggt acatgggggg 2100 
ctcaggcact gggtgagatg aggtacacgg ggggctcagg cagagggtca gaccaggtac 216 0 
acgggggctc tgatcacacg cacatatgag cacatgtgca catgtgctgt ttcatggtag 2220 
ccaggtctgt gcacacctgc cccaaagtcc caggaagctg agaggccaaa gatggaggct 2280 
gacagggctg gcgcggtggc tcacacctgt agtcccagca ctttgggagg ccgaggcgag 2340 
aggatccctt gagcccagga gtttaagacc agcctgagca acatagtaga accccatctc 2400 
tatgaaaaat aaaaacaaaa attagctgaa catggtggtg tgcgcctgta gttccaatac 2460 
ttgggaggct gaagtgggag gatcacttga gcccaggagg tggaagctgc agtgagctga 2 520 
gattgcacca ctgtactgca gcctgggtga cagagtgaga gcccatctca acaacaacaa 2580 
agaagactga caaatgcagt ttcttggaaa gaaacattta gtaggaactt aacctacaca 2 540 
cagaagccaa gtcggtgtct cggtgtcagt gagatgagat gatgggtcct cacaccatca 2700 
ccccagaccc agggtttatg caccacaggg gcgggtggct cagaagggat gcgcaggacg 2760 
ttgatatacg atgacatcaa ggttgtctga cgaagggcag gattcatgat aagtacctgc 2820 
tggtacacaa ggaacaatgg ataaactgga aaccttagag gccttcccgg aacaggggct 288 0 
aatcagaagc cagcatgggg ggctggcatc caggatggag ctgcttcagc ctccacatgc 2940 
gtgttcatac agatggtgca cagaaacgca gtgtacctgt gcacacacag acacgcagct 3000 



Le A 32 805 -Foreign Countries 
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actcgcacac acaagcacac acacagacat gcatgcatgc atccgtgtgt gtgcacctgt 3 06 0 

gcccatgagg aaacccatgc atgtgcattc atgcacgcac acaggcaccg gtgggcccat 312 0 

gcccacaccc acgagcaccg tctgattagg aggcctttcc tctgacgctg tccgccatcc 3180 

tctcag 3186 

<210> 18 

<211> 781 

<212> DNA 

<213> Homo sapiens 

<400> 18 

gtatgtgcag gtgcctggcc tcagtggcag cagtgcctgc ctgctggtgt tagtgtgtca 6 0 
ggagactgag tgaatctggg cttaggaagt tcttacccct tttcgcatca ggaagtggtt 120 
taacccaacc actgtcaggc tcgtctgccc gccctctcgt ggggtgagca gagcacctga 180 
tggaagggac aggagctgtc tgggagctgc catccttccc accttgctct gcctggggaa 240 
gcgctggggg gcctggtctc tcctgtttgc cccatggtgg gatttggggg gcctggcctc 3 00 
tcctgtttgc cctgtggtgg gattgggctg tctcccgtcc atggcactta gggcccttgt 3 60 
gcaaacccag gccaagggct taggaggagg ccaggcccag gctaccccac ccctctcagg 420 
agcagaggcc gcgtatcacc acgacagagc cccgcgccgt cctctgcttc ccagtcaccg 480 
tcctctgccc ctggacactt tgtccagcat cagggaggtt tctgatccgt ctgaaattca 540 
agccatgtcg aacctgcggt cctgagctta acagcttcta ctttctgttc tttctgtgtt 600 

gtggaaattt cacctggaga agccgaagaa aacatttctg tcgtgactcc tgcggtgctt 6 60 
gggtcgggac agccagagat ggagccaccc cgcagaccgt cgggtgtggg cagctttccg 720 
gtgtctcctg ggaggggagc tgggctgggc ctgtgactcc tcagcctctg ttttccccca 7 80 

g 781 

<210> 19 

<211> 536 

<212> DNA 

<213> Homo sapiens 

<400> 19 

gcaagtgtgg gtggaggcca gtgcgggccc c&cctgccca. ggggtcatcc ttgaacgccc 60 

tgtgtggggc gagcagcctc agatgctgct gaagtgcaga cgcccccggg cctgaccctg 120 

ggggcctgga gccacgctgg cagccctatg tgattaaacg ctggtgtccc caggccacgg 180 

agcctggcag ggtccccaac ttcttgaacc cctgcttccc atctcagggg cgatggctcc 240 

ccacgcttgg gagccttctg acccctgacc tgtgtcctct cacagcctct tccctggctg 300 

ctgccctgag ctcctggggt cctgagcaag ttctctcccc gccccgccgc tccagcgtca 360 

ctgggctgcc tgtctgctcg ccccggtgga ggggtgtctg tcccttcact gaggttccca 420 

ccagccaggg ccacgaggtg caggccctgc ctgcccggcc acccacacgt cctaggaggg 480 

ttggaggatg ccacctctgg cctcttctgg aacggagtct gattttggcc ccgcag 536 

<210> 20 

<211> 3179 

<212> DNA 

<213> Homo sapiens 

<400> 20 

atctcatgtt tgaatcctaa tgtgcactgc atagacacca ctgtatgcaa ttacagaagc 50 

ctgtgagtga acggggtggt ggtcagtgcg ggcccatggc ctggctgtgc atttacggaa 12 0 

gtctatgagt gaatggggtt gtggtcagtg cgggcccatg gcctggctgg gcctgggagg 180 

tttctgatgc tgtgaggcag gaggggaagg agggtagggg atagacagtg ggagccccca 24 0 

ccctggaaga cataacagta agtccaggcc cgaagggcag cagggatgct gggggcccag 3 00 

cttgggcggc ggggatgatg gagggcctgg ccagggtggc agggatgatg ggggccccag 360 

ctggggtggc aggggtgatg gggggggctg gtctgggtgg cggggaagat ggggaagcct 42 0 

ggctgggccc cctcctcccc tgcctcccac ctgcagccgt ggatccggat gtgcttccct 480 

ggtgcacatc ctctgggcca tcagctttca tggaggtggg gggcaggggc atgacaccat 540 

cctgtataaa atccaggatt cctcctcccg aacgccccaa ctcaggttga aagtcacatt 600 

ccgcctctgg ccattctctt aagagtagac caggactctg atctctgaag ggtgggtagg 660 

gtggggcagt ggagggtgtg gacacaggag gcttcagggt ggggctggtg atgctctctc 72 0 

atcctcttat catctcccag tctcatcccc catcctctta tcatctccca gtctcatctg 780 

tcttcctcct atctcccagt ctcatctgrc atcctcttac catctcccag tctcatctct 840 

tatcctctta tctcctagtc tcatccagac ttacctccca gggcgggtgc caggctcgca 900 

gtggagctgg acatacgtcc ttcctcaggc agaaggaact ggaaggattg cagagaacag 960 

gaggggcggc tcagagggac gcagtcttgg ggtgaagaaa cagcccctcc tcagaagttg 1020 

gcttgggcca cacgaaaccg agggccctgc gtgagtggct ccagagcctt ccagcaggtc 1080 

cctggtgggg ccttatggta tggccgggtc ctactgagtg caccttggac agggcttctg 1140 

gtttgagtgc agcccggacg tgcctggtgt cggggtgggg gcttatggcc actggatatg 1200 

gcgtcattta ttgctgctgc ttcagagaat gtctgagtga ccgagcctaa tgtgtatggt 1260 

gggcccaagt ccacagactg tgtcgcaaat gcactctggt gcctggagcc cccgtatagg 1320 

agctgtgagg aaggaggggc tcttggcagc cggcctgggg gcgcctttgc cctgcaaact 13 80 

ggaagggagc ggccccgggc gccgtgggcg gacgacctca agtgagaggt tggacagaac 1440 

agggcgggga cttcccagga gcagaggccg ctgctcaggc acacctgggt ttgaatcaca 1500 

gaccaacagg tcaggccatt gttcagctat ccatcttcta caaagctcca gattcctgtt 1560 

tctccgggtg ttttttgttg aaattttact caggattact tatatttttt gctaaagtat 1620 

tagaccctta aaaaaggtat ttgctttgat atggcttaac tcactaagca cctactttat 1680 

ttgtctgttt ttatttatta ttattattat tattagagat ggtgtctact ctgtcaccca 1740 
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ggttgttagt gcagtggcac agtcatggct cgctgtagcc gcaaaccccc aggctcaagt 1800 
gatcctccgg cctcagcttc ccagagtgct gggattacag gtgtgagcca ctgcccttgc 1860 
ctggcacttt taaaaaccac tatgtaaggt caggtccagt ggcttccaca cctgtcatcc 192 0 
cagtagtttg ggaagccgag gcagaaggat tgtctgaggc caggagtttg agaccagcat 1980 
gggtaacata gggagacccc atctctacaa aaaatgcaaa aagttatccg ggcgtggggt 2040 
ccagcatctg tagtcccagc tgctcgggag gctgagtggg aggatcgctt gagcccggga 2100 
ggtcatggct gcagtgagct gtgattgtac catcgcactc cagcctgggc aacagagtga 2160 
gaccctgtct caaaaaaaaa aaaaaaaaaa gaaggagaag gagaagagaa gaagaaggaa 222 0 
gaaggaaaga gaagaagaag gaagaaggaa gaaagaagga gaaggaggcc tgctaggtgc 2280 
taggtagact gtcaaatctc agagcaaaat gaaaataaca aagttttaaa gggaaagaaa 2340 
aaccccagct ctttggactt ccttaggcct gaacttcatc tcaagcagct tccttccaca 2400 
gacaagcgtg tatggagcga gtgagttcaa agcagaaagg gaggagaagc aggcaagggt 2460 
ggaggctgtg ggtgacacca gccaggaccc ctgaaaggga gtggttgttt tcctgcctca 2520 
gccccacgct cctgccggtc ctgcacctgc tgtaaccgtc gatgttggtg ccaggtgccc 2580 
acctgggaag gatgctgtgc agggggcttg ccaaactttg gtgggtttca gaagccccag 2 640 
gcacttgtgg caggcacaat tacagcccct ccccaaagat gcccacgtcc ttctcctgga 2700 
acctgtgaat gtgtcacccg caaggcagag gctggtgaag gctgcaggtg gaatcacggc 2760 
tgccagtcag ccgatcttaa ggtcatcctg gattatctgg tgggcctgat atggccacaa 2820 
gggtccctag aagtgagaga gggaggcagg ggagagtcag agaggggacg tgagaaggac 2880 
cactggccac tgctggcttt gagatggagg agggggtccc cagccaagga atgggggcag 2940 
ccgctccatg ctggaaaagc aagcaatcct ccccggtcct gagggcacac ggccctgccc 3000 
acgcctcgat ttcaggccag tgggacctgt ttcagctttc cggcctccag agctgtaaga 3060 
tgatgcgttt gtgttcagcc actaagctgc agtgattcgt cacagcagca aatggaatag 3120 
cagtacaggg aaatgaatac agggacagtt ctcagagtga ctctcagccc acccctggg 3179 
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Patent Claims 



Regulatory DNA sequences for the gene for the human catalytic telomerase 
subunit. 

DNA sequences according to Claim 1, characterized in that the sequences are 
intron sequences in accordance with SEQ ID NO 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19 and/or 20 or fragments of these sequences which 
have a regulatory effect. 

DNA sequences according to Claim 1, characterized in that the sequences are 
the 5'-flanking regulatory DNA sequence for the gene for the human catalytic 
telomerase subunit as depicted in Fig. 10 (SEQ ID NO 3), or fragments of this 
DNA sequence which have a regulatory effect. 

Recombinant construct which contains a DNA sequence according to one of 
Claims 1 to 3. 

Recombinant construct according to Claim 4, characterized in that it 
additionally contains one or more DNA sequences which encode polypeptides 
or proteins. 

Vector which contains a recombinant construct according to Claim 4 or 5. 

Use of recombinant constructs or vectors according to one of Claims 4 to 6 
for preparing medicaments. 



Recombinant host cells which harbour recombinant constructs or vectors 
accordmg to one of Claims 4 to 6. 
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9. Process for identifying substances which affect the promoter activity, silencer 
activity or enhancer activity of the human catalytic telomerase subunit, 
comprising the following steps: 

A. adding a candidate substance to a host cell which harbours DNA 
sequences according to one of Claims 1 to 3, which sequences are 
functionally linked to a reporter gene, and 

B. measuring the effect of the substance on expression of the reporter 
gene. 

10. Process for identifying factors which bind specifically to the DNA according 
to one of Claims 1 to 3, or to fragments thereof, characterized in that an 
expression cDNA library is screened using a DNA sequence according to one 
of Claims 1 to 3, or sub fragments of widely differing length, as the probe. 

1 1 . Transgenic animals which harbour recombinant constructs or vectors 
according to Claims 4 to 6. 

12. Process for detecting telomerase-associated conditions in a patient, 
comprising the following steps: 

A. incubating a recombinant construct or vector according to Claims 4 to 
6, which additionally contains a reporter gene, with body fluids or cell 
samples, 

B. detecting the activity of the reporter gene m order to obtain a 
diagnostic value, and 
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C. comparing the diagnostic value with standard values for the reporter 
gene construct in standardized normal cells or body fluids of the same 
type as the test sample. 
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Regulatory DNA sequences of the gene for the human catalytic telomerase 
subunit, and their diagnostic and therapeutic use 



Abstract 



This invention relates to regulatory DNA sequences, comprising promoter sequences 
and intron sequences, for the gene for the human catalytic telomerase subunit. ki 
addition, this invention relates to the use of these DNA sequences for 
pharmaceutical, diagnostic and therapeutic purposes, especially in the treatment of 
cancer and ageing. 
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Fig.2 
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Fig.4 

GAGCTCTGAA CCGTGGAAAC GAACATGACC CTTGCCTGCC TGCTTCCCTG GGTGGGTCAA GGGTAATGAA 70 
GTGGTGTGCA GGAAATGGCC ATGTAAATTA CACGACTCTG CTGATGGGGA CCGTTCCTTC CATCATTATT 140 
CATCTTCACC CCCAAGGACT GAATGATTCC AGCAACTTCT TCGGGTGTGA CAAGCCATGA CAAAACTCAG 210 
TACAAACACC ACTCTTTTAC TAGGCCCACA GAGCACGGGC CACACCCCTG ATATATTAAG AGTCCAGGAG 280 
AGATGAGGCT GCTTTCAGCC ACCAGGCTGG GGTGACAACA GCGGCTGAAC AGTCTGTTCC TCTAGACTAG 350 
TAGACCCTGG CAGGCACTCC CCCAAATTCT AGGGCCTGGT TGCTGCTTCC CGAGGGCGCC ATCTGCCCTG 420 
GAGACTCAGC CTGGGGTGCC ACACTGAGGC CAGCCCTGTC TCCACACCCT CCGCCTCCAG GCCTCAGCTT 4 90 
CTCCAGCAGC TTCCTAAACC CTGGGTGGGC CGTGTTCCAG CGCTACTGTC TCACCTGTCC CACTGTGTCT 560 
TGTCTCAGCG ACGTAGCTCG CACGGTTCCT CCTCACATGG GGTGTCTGTC TCCTTCCCCA ACACTCACAT 630 
GCGTTGAAGG GAGGAGATTC TGCGCCTCCC AGACTGGCTC CTCTGAGCCT GAACCTGGCT CGTGGCCCCC 700 
GATGCAGGTT CCTGGCGTCC GGCTGCACGC TGACCTCCAT TTCCAGGCGC TCCCCGTCTC CTGTCATCTG 770 
CCGGGGCCTG CCGGTGTGTT CTTCTGTTTC TGTGCTCCTT TCCACCrCCA GCTGCGTGTG TCTCTGCCCG 840 
CTAGGGTCTC GGGGTTTTTA TAGGCATAGG ACGGGGGCGT GGTGGGCCAG GGCGCTCTTG GGAAATGCAA 910 
CATTTGGGTG TGAAAGTAGG AGXGCCTGTC CTCACCTAGG TCCACGGGCA CAGGCCTGGG GATGGAGCCC 98 0 
CCGCCAGGGA CCCGCCCTTC TCTGCCCAGC ACTTTCCTGC CCCCCTCCCT CTGGAACACA GAGTGGCAGT 1050 
TTCCACAAGC ACTAAGCATC CTCTTCCCAA AAGACCCAGC ATTGGCACCC CTGGACATTT GCCCCACAGC 112 0 
CCTGGGAATT CACGTGACTA CGCACATCAT GTACACACTC CCGTCCACGA CCGACCCCCG CTGTTTTATT 1190 
TTAATAGCTA CAAAGCAGGG AAATCCCTGC TAAAATGTCC TTTAACAAAC TGGTTAAACA AACGGGTCCA 1260 
TCCGCACGGT GGACAGTTCC TCACAGTGAA GAGGAACATG CCGTTTATAA AGCCTGCAGG CATCTCAAGG 1330 
GAATTACGCT GAGTCAAAAC TGCCACCTCC ATGGGATACG TACGCAACAT GCTCAAAAAG AAAGAATTTC 140 0 
ACCCCATGGC AGGGGAGTGG TTAGGGGGGT TAAGGACGGT GGGGGCGGCA GCTGGGGGCT ACTGCACGCA 1470 
CCT-TTTACTA AAGCCAGTTT CCTGGTTCTG ATGGTATTGG CTCAGTTATG GGAGACTAAC CATAGGGGAG 1540 
TGGGGATGGG GGAACCCGGA GGCTGTGCCA TCTTTGCCAT GCCCGAGTGT CCTGGGCAGG ATAATGCTCT 1610 
AGAGATGCCC ACGTCCTGAT TCCCCCAAAC CTGTGGACAG AACCCGCCCG GCCCCAGGGC CTTTGCAGGT 1680 
GTGATCTCCG TGAGGACCCT GAGGTCTGGG ATCCTTCGGG ACTACCTGCA GGCCCGAAAA GTAATCCAGG 1750 
GGTTCTGGGA AGAGGCGGGC AGGAGGGTCA GAGGGGGGCA GCCTCAGGAC GATGGAGGCA GTCAGTCTGA 1820 
GGCTGAAAAG GGAGGGAGGG CCTCGAGCCC AGGCCTGCAA GCGCCTCCAG AAGCTGGAAA AAGCGGGGAA 1890 
GGGACCCTCC ACGGAGCCTG CAGCAGGAAG GCACGGCTGG CCCTTAGCCC ACCAGGGCCC ATCGTGGACC 19 60 
TCCGGCCTCC GTGCCATAGG AGGGCACTCG CGCTGCCCTT CTAGCATGAA GTGTGTGGGG ATTTGCAGAA 2030 
GCAACAGGAA ACCCATGCAC TGTGAATCTA GGATTATTTC AAAACAAAGG TTTACAGAAA CATCCAAGGA 2100 
CAGGGCTGAA GTGCCTCCGG GCAAGGGCAG GGCAGGCACG AGTGATTTTA TTTAGCTATT TTATTTTATT 2170 
TACTTACTTT CTGAGACAGA GTTATGCTCT TGTTGCCCAG GCTGGAGTGC AGCGGCATGA TCTTGGCTCA 2240 
CTGCAACCTC CGTCTCCTGG GTTCAAGCAA TTCTCGTGCC TCAGCCTCCC AAGTAGCTGG GATTTCAGGC 2310 
GTGCACCACC ACACCCGGCT AATTTTGTAT TTTTAGTAGA GATGGGCTTT CACCATGTTG GTCAAGCTGA 2380 
TCTCAAAATC CTGACCTCAG GTGATCCGCC CACCTCAGCC TCCCAAAGTG CTGGGATTAC AGGCATGAGC 2450 
CACTGCACCT GGCCTATTTA ACCATTTTAJ^ AACTTCCCTG GGCTCAAGTC ACACCCACTG GTAAGGAGTT 2520 
CATGGAGTTC AATTTCCCCT TTACTCAGGA GTTACCCTCC TTTGATATTT TCTGTAATTC TTCGTAGACT 2590 
GGGGATACAC CGTCTCTTGA CATATTCACA GTTTCTGTGA CCACCTGTTA TCCCATGGGA CCCACTGCAG 2660 
GGGCAGCTGG GAGGCTGCAG GCTTCAGGTC CCAGTGGGGT TGCCATCTGC CAGTAGAAAC CTGATGTAGA 2730 
ATCAGGGCGC AAGTGTGGAC ACTGTCCTGA ATCTCAATGT CTCAGTGTGT GCTGAAACAT GTAGAAATTA 2800 
AAGTCCATCC CTCCTACTCT ACTGGGATTG AGCCCCTTCC CTATCCCCCC CCAGGGGCAG AGGAGTTCCT 2 870 
CTCACTCCTG TGGAGGAAGG AATGATACTT TGTTATTTTT CACTGCTGGT ACTGAATCCA CTGTTTCATT 2940 
TGTTGGTTTG TTTGTTTTGT TTTGAGAGGC GGTTTCACTC TTGTXGCTCA GGCTGGAGGG AGTGCAATGG 3010 
CGCGATCTTG GCTTACTGCA GCCTCTGCCT CCCAGGTTCA AGTGATTCTC CTGCTTCCGC CTCCCATTTG 3080 
GCTGGGATTA CAGGCACCCG CCACCATGCC CAGCTAATTT TTTGTATTTT TAGTAGAGAC GGGGGTGGGT 3150 
GGGGTTCACC ATGTTGGCCA GGCTGGTCTC GAACTTCTGA CCTCAGATGA TCCACCTGCC TCTGCCTCCT 3220 
AAAGTGCTGG GATTACAGGT GTGAGCCACC ATGCCCAGCT CAGAATTTAC TCTGTTTAGA AACATCTGGG 3290 
TCTGAGGTAG GAAGCTCACC CCACTCAAGT GTTGTGGTGT TTTAAGCCAA TGATAGAATT TTTTTATTGT 3360 
TGTTAGAACA CTCTTGATGT TTTACACTGT GATGACTAAG ACATCATCAG CTTTTCAAAG ACACACTAAC 3430 
TGCACCCATA ATACTGGGGT GTCTTCTGGG TATCAGCAAT CTTCATTGAA TGCCGGGAGG CGTTTCCTCG 3500 
CCATGCACAT GGTGTTAATT ACTCCAGCAT AATCTTCTGC TTCCATTTCT TCTCTTCCCT CTTTTAAAAT 357 0 
TGTGTTTTCT ATGTTGGCTT CTCTGCAGAG AACCAGTGTA AGCTACAACT TAACTTTTGT TGGAACAAAT 3640 
TTTCCAAACC GCCCCTTTGC CCTAGTGGCA GAGACAATTC ACAAACACAG CCCTTTAAAA AGGCTTAGGG 3710 
ATCACTAAGG GGATTTCTAG AAGAGCGACC TGTAATCCTA AGTATTTACA AGACGAGGCT AACCTCCAGC 3780 
GAGCGTGACA GCCCAGGGAG GGTGCGAGGC CTGTTCAAAT GCTAGCTCCA TAAATAAAGC AATTTCCTCC 3850 
GGCAGTXTCT GAAAGTAGGA AAGGTTACAT TTAAGGTTGC GTTTGTTAGC ATTTCAGTGT TTGCCGACCT 3920 
CAGCTACAGC ATCCCTGCAA GGCCTCGGGA GACCCAGAAG TTTCTCGCCC CCTTAGATCC AAACTTGAGC 3990 
AACCCGGAGT CTGGATTCCT GGGAAGTCCT CAGCTGTCCT GCGGTTGTGC CGGGGCCCCA GGTCTGGAGG 4060 
GGACCAGTGG CCGTGTGGCT TCTACTGCTG GGCTGGAAGT CGGGCCTCCT AGCTCTGCAG TCCGAGGCTT 4130 
GGAGCCAGGT GCCTGGACCC CGAGGCTGCC CTCCACCCTG TGCGGGCGGG ATGTGACCAG ATGTTGGCCT 4200 
CATCTGCCAG ACAGAGTGCC GGGGCCCAGG GTCAAGGCCG TTGTGGCTGG TGTGAGGCGC CCGGTGCGCG 4270 
GCCAGCAGGA GCGCCTGGCT CCATTTCCCA CCCTTTCTCG ACGGGACCGC CCCGGTGGGT GATTAACAGA 4340 
TTTGGGGTGG TTTGCTCATG GTGGGGACCC CTCGCCGCCT GAGAACCTGC AAAGAGAAAT GACGGGCCTG 4410 
TGTCAAGGAG CCCAAGTCGC GGGGAAGTGT TGCAGGGAGG CACTCCGGGA GGTCCCGCGT GCCCGTCCAG 4480 
GGAGCAATGC GTCCTCGGGT TCGTCCCCAG CCGCGTCTAC GCGCCTCCGT CCTCCCCTTC ACGTCCGGCA 4550 
TTCGTGGTGC CCGGAGCCCG ACGCCCCGCG XCCGGACCTG GAGGCAGCCC TGGGTCTCCG GATCAGGCCA 4620 
GCGGCCAAAG GGTCGCCGCA CGCACCTGTT CCCAGGGCCT CCACATCATG GCCCCTCCCT CGGGTTACCC 4690 
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Fig. 4 (continued) 



CACAGCCTAG GCCGATTCGA CCTCTCTCCG CTGGGGCCCT CGCTGGCGTC CCTGCACCCT GGGAGCGCGA 4760 
GCGGCGCGCG GGCGGGGAAG CGCGGCCCAG ACCCCCGGGT CCGCCCGGAG CAGCTGCGCT GTCGGGGCCA 4830 
GGCCGGGCTC CCAGTGGATT CGCGGGCACA GACGCCCAGG ACCGCGCTCC CCACGTGGCG GAGGGACTGG 4 90 0 
GGACCCGGGC ACCCGTCCTG CCCCTTCACC TTCCAGCTCC GCCTCCTCCG CGCGGACCCC GCCCCGTCCC 4 970 
GACCCCTCCC GGGTCCCCGG CCCAGCCCCC TCCGGGCCCT CCCAGCCCCT CCCCTTCCTT TCCGCGGCCC 5040 
CGCCCTCTCC TCGCGGCGCG AGTTTCAGGC AGCGCTGCGT CCTGCTGCGC ACGTGGGAAG CCCTGGCCCC 5110 
GGCCACCCCC GCGATG hllZ 
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Fig. 6 

GTTTCAGGCA GCGCTGCGTC CTGCTGCGCA CGTGGGAAGC CCTGGCCCCG GCCACCCCCG CGATGCCGCG 70 
CGCTCCCCGC TGCCGAGCCG TGCGCTCCCT GCTGCGCAGC CACTACCGCG AGGTGCTGCC GCTGGCCACG 14 0 
TTCGTGCGGC GCCTGGGGCC CCAGGGCTGG CGGCTGGTGC AGCGCGGGGA CCCGGCGGCT TTCCGCGCGC 210 
TGGTGGCCCA GTGCCTGGTG TGCGTGCCCT GGGACGCACG GCCGCCCCCC GCCGCCCCCT CCTTCCGCCA 280 
GGTGTCCTGC CTGAAGGAGC TGGTGGCCCG AGTGCTGCAG AGGCTGTGCG AGCGCGGCGC GAAGAACGTG 350 
CTGGCCTTCG GCTTCGCGCT GCTGGACGGG GCCCGCGGGG GCCCCCCCGA GGCCTTCACC ACCAGCGTGC 42 0 
GCAGCTACCT GCCCAACACG GTGACCGACG CACTGCGGGG GAGCGGGGCG TGGGGGCTGC TGCTGCGCCG 4 90 
CGTGGGCGAC GACGTGCTGG TTCACCTGCT GGCACGCTGC GCGCTCTTTG TGCTGGTGGC TCCCAGCTGC 560 
GCCTACCAGG TGTGCGGGCC GCCGCTGTAC CAGCTCGGCG CTGCCACTCA GGCCCGGCCC CCGCCACACG 630 
CTAGTGGACC CCGAAGGCGT CTGGGATGCG AACGGGCCTG GAACCATAGC GTCAGGGAGG CCGGGGTCCC 700 
CCTGGGCCTG CCAGCCCCGG GTGCGAGGAG GCGCGGGGGC AGTGCCAGCC GAAGTCTGCC GTTGCCCAAG 
AGGCCCAGGC GTGGCGCTGC CCCTGAGCCG GAGCGGACGC CCGTTGGGCA GGGGTCCTGG GCCCACCCGG 
GCAGGACGCG TGGACCGAGT GACCGTGGTT TCTGTGTGGT GTCACCTGCC AGACCCGCCG AAGAAGCCAC 
CTCTTTGaAG GGTGCGCTCT CTGGCACGCG CCACTCCCAC CCATCCGTGG GCCGCCAGCA CCACGCGGGC 
CCCCCATCCA CATCGCGGCC ACCACGTCCC TGGGACACGC CTTGTCCCCC GGTGTACGCC GAGACCAAGC 1050 
ACTTCCTCTA CTCCTCAGGC GACAAGGAGC AGCTGCGGCC CTCCTTCCTA CTCAGCTCTC TGAGGCCCAG 1120 
CCTGACTGGC GCTCGGAGGC TCGTGGAGAC CATCTTTCTG GGTTCCAGGC CCTGGATGCC AGGGACTCCC 1190 
CGCAGGTTGC CCCGCCTGCC CCAGCGCTAC TGGCAAATGC GGCCCCTGTT TCTGGAGCTG CTTGGGAACC 12 60 
ACGCGCAGTG CCCCTACGGG GTGCTCCTCA AGACGCACTG CCCGCTGCGA GCTGCGGTCA CCCCAGCAGC 1330 
CGGTGTCTGT GCCCGGGAGA AGCCCCAGGG CTCTGTGGCG GCCCCCGAGG AGGAGGACAC AGACCCCCGT 1400 
CGCCTGGTGC AGCTGCTCCG CCAGCACAGC AGCCCCTGGC AGGTGTACGG CTTCGTGCGG GCCTGCCTGC 1470 
GCCGGCTGGT GCCCCCAGGC CTCTGGGGCT CCAGGCACAA CGAACGCCGC TTCCTCAGGA ACACCAAGAA 1540 
GTTCATCTCC CTGGGGAAGC ATGCCAAGCT CTCGCTGCAG GAGCTGACGT GGAAGATGAG CGTGCGGGAC 1610 
TGCGCTTGGC TGCGCAGGAG CCCAGGGGTT GGCTGTGTTC CGGCCGCAGA GCACCGTCTG CGTGAGGAGA 1680 
TCCTGGCCAA GTTCCTGCAC TGGCTGATGA GTGTGTACGT CGTCGAGCTG CTCAGGTCTT TCTTTTATGT 1750 
CACGGAGACC ACGTTTCAAA AGAACAGGCT CTTTTTCTAC CGGAAGAGTG TCTGGAGCAA GTTGCAAAGC 1820 
ATTGGAATCA GACAGCACTT GAAGAGGGTG CAGCTGCGGG AGCTGTCGGA AGCAGAGGTC AGGCAGCATC 1890 
GGGAAGCCAG GCCCGCCCTG CTGACGTCCA GACTCCGCTT CATCCCCAAG CCTGACGGGC TGCGGCCGAT 1960 
TGTGAACATG GACTACGTCG TGGGAGCCAG AACGTTCCGC AGAGAAAAGA GGGCCGAGCG TCTCACCTCG 2030 
AGGGTGAAGG CACTGTTCAG CGTGCTCAAC TACGAGCGGG CGCGGCGCCC CGGCCTCCTG GGCGCCTCTG 2100 
TGCTGGGCCT GGACGATATC CACAGGGCCT GGCGCACCTT CGTGCTGCGT GTGCGGGCCC AGGACCCGCC 2170 
GCCTGAGCTG TACTTTGTCA AGGTGGATGT GACGGGCGCG TACGACACCA TCCCCCAGGA CAGGCTCACG 2240 
GAGGTCATCG CCAGCATCAT CAAACCCCAG AACACGTACT GCGTGCGTCG GTATGCCGTG GTCCAGAAGG 2310 
CCGCCCATGG GCACGTCCGC AAGGCCTTCA AGAGCCACGT CTCTACCTTG ACAGACCTCC AGCCGTACAT 2380 
GCGACAGTTC GTGGCTCACC TGCAGGAGAC CAGCCCGCTG AGGGATGCCG TCGTCATCGA GCAGAGCTCC 2450 
TCCCTGAATG AGGCCAGCAG TGGCCTCTTC GACGTCTTCC TACGCTTCAT GTGCCACCAC GCCGTGCGCA 252 0 
TCAGGGGCAA GTCCTACGTC CAGTGCCAGG GGATCCCGCA GGGCTCCATC CTCTCCACGC TGCTCTGCAG 2590 
CCTGTGCTAC GGCGACATGG AGAACAAGCT GTTTGCGGGG ATTCGGCGGG ACGGGCTGCT CCTGCGTTTG 2 660 
G'T'GGATGATT TCTTGTTGGT GACACCTCAC CTCACCCACG CGAAAACCTT CCTCAGGACC CTGGTCCGAG 2 730 
GTGTCCCTGA GTATGGCTGC GTGGTGAACT TGCGGAAGAC AGTGGTGAAC TTCCCTGTAG AAGACGAGGC 2800 
CCTGGGTGGC ACGGCTTTTG TTCAGATGCC GGCCCACGGC CTATTCCCCT GGTGCGGCCT GCTGCTGGAT 2870 
ACCCGGACCC TGGAGGTGCA GAGCGACTAC TCCAGCTATG CCCGGACCTC CATCAGAGCC AGTCTCACCT 2940 
TCA^CCGCGG CTTCAAGGCT GGGAGGAACA TGCGTCGCAA ACTCTTTGGG GTCTTGCGGC TGAAGTGTCA 3010 
CAGCCTGTTT CTGGATTTGC AGGTGAACAG CCTCCAGACG GTGTGCACCA ACATCTACAA GATCCTCCTG 3080 
CTGCAGGCGT ACAGGTTTCA CGCATGTGTG CTGCAGCTCC CATTTCATCA GCAAGTTTGG AAGAACCCCA 3150 
CATTTTTCCT GCGCGTCATC TCTGACACGG CCTCCCTCTG CTACTCCATC CTGAAAGCCA AGAACGCAGG 3220 
GATGTCGCTG GGGGCCAAGG GCGCCGCCGG CCCTCTGCCC TCCGAGGCCG TGCAGTGGCT GTGCCACCAA 32 90 
GCATTCCTGC TCAAGCTGAC TCGACACCGT GTCACCTACG TGCCACTCCT GGGGTCACTC AGGACAGCCC 3360 
AGACGCAGCT GAGTCGGAAG CTCCCGGGGA CGACGCTGAC TGCCCTGGAG GCCGCAGCCA ACCCGGCACT 3430 
GCCCTCAGAC TTCAAGACCA TCCTGGACTG ATGGCCACCC GCCCACAGCC AGGCCGAGAG CAGACACCAG 3500 
CAGCCCTGTC ACGCCGGGCT CTACGTCCCA GGGAGGGAGG GGCGGCCCAC ACCCAGGCCC GCACCGCTGG 3570 
GAGTCTGAGG CCTGAGTGAG TGTTTGGCCG AGGCCTGCAT GTCCGGCTGA AGGCTGAGTG TCCGGCTGAG 3640 
GCCTGAGCGA GTGTCCAGCC AAGGGCTGAG TGTCCAGCAC ACCTGCCGTC TTCACTTCCC CACAGGCTGG 3710 
CGCTCGGCTC CACCCCAGGG CCAGCTTTTC CTCACCAGGA GCCCGGCTTC CACTCCCCAC ATAGGAATAG 3780 
TCCATCCCCA GATTCGCCAT TGTTCACCCC TCGCCCTGCC CTCCTTTGCC TTCCACCCCC ACCATCCAGG 3850 
TGGAGACCCT GAGAAGGACC CTGGGAGCTC TGGGAATTTG GAGTGACCAA AGGTGTGCCC TGTACACAGG 3920 
CGAGGACCCT GCACCTGGAT GGGGGTCCCT GTGGGTCAAA TTGGGGGGAG GTGCTGTGGG AGTAAAATAC 3990 
TGAATATATG AGTTTTTCAG TTTTGAAAAA AAAAAAAAAA AAAAAAAAAA AA 
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Fig. 10 

ACTTGAGCCC AAGAGTTCAA GGCTACGGTG AGCCATGATT GCAACACCAC ACGCCAGCCT TGGTGACAGA -11204 
atgagaccct' GTCTCAAAAA AAAAAAAAAA AATTGAAATA ATATAAAGCA TCTTCTCTGG CCACAGTGGA -11134 
ACAAAACCAG AAATCAACAA CAAGAGGAAT TTTGAAAACT ATACAAACAC ATGAAAATTA AACAATATAC -11064 
TTCTGAATGA CCAGTGAGTC AATGAAGAAA'tTAAAAAGGA AATTGAAAAA TTTATTTAAG CAAATGATAA -10 99 4 
CGGAAACATA ACCTCTCA?J\ ACCCACGGTA TACAGCAAAA GCAGTGCTAA. GAJVGG.AAGTT TATAGCTATA -10 92 4 
AGCAGCTACA TCAAJ=lAAAGT AGAAAAGCCA GGCGCAGTGG CTCATGCCTG TAATCCCAGC ACTTTGGGAG -10854 
GCCA.AGGCGG GCAGATCGCC TGAGGTCAGG AGTTCGAGAC CAGCCTGACC AACACAGAGA AACCTTGTCG -10784 
CTACTAAAAA TACPAA.ATTA GCTGGGCATG GTGGCACATG CCTGTAATCC CAGCTACTCG GGAGGCTGAG -10714 
GC2>GGATAAC CGCTTGAACC CAGGAGGTGG AGGTTGCGGT GAGCCGGGAT TGCGCCATTG GACTCCAGCC -10644 
TGGGTA.QCAA GAGTG.AAACC CTGTCTCA.AG A?A.Zi_AAAAAA A_AGTAGA.AAA ACTTAAAAAT ACP_ACCTAA.T -1057 4 
GATGCACCTT AAAGAACTAG AAAAGCAAGA GCAAACTAAA CCTAAAATTG GTAAAAGA?A AGAAATAATA -1050 4 
AAGATCAGAG CAG2W.TAAA TGAAACTGAA AGATAACAAT AC.ajVAAGATC A_ACAAAATTA AAA.GTTGGTT -1043 4 
TTTTG^AAAG ATAAAC.AA.AA TTG.ACAJ-ACC TTTGCCCAGA CTA-AG.AA.AAA AGG.AAAGAAG ACCTA.AATA_A -10364 
ATA.^\.AGTCAG AGATGA.AAAA AGAGACATTA CPJ\CTGATAC CACAGAA.ATT CP-A.AGGATCA CTAG.AGGCTA -102 94 
CTATGAGCAA CTGTACACTA ATAA.ATTG.AA A.A.ACCTAGAA A_AA.ATAG.ATA AATTCCTAGA TGCATAC.AAC -10224 
CTACCAAGAT TGAACCATGA AG.A.A.ATCCAA AGCCC.^i-AA.CA G.ACCAATAAC AATA.ATGGGA TTA.AAGCCAT -10154 
AATJ>A.AA.AGT CTCCTAGCAA AG.AGAAGCCC AGGACCCAAT GGCTTCCCTG CTGG.ATTTTA CCA.ATC.ATTT -10084 
A-AAG.A.AG.A.AT GAATTCC?_AT CCTACTCAA.A CTATTCTGAA A_WAGAGGA AAG.A.ATACTT CCAAA.CTCAT -10014 
TCTA'-.ATGGC CAGTATT.ACC CTG.ATTCC.AA P.ACCAG.ACAA AAA.CACATCA A_A.AA.CAAACA A.ACA_i_A.ziJi-AA -9944 
CAG.A.a.AG.A.A.A G^AAACTACA GGCC^^-ATATC CCTG.ATG.A.AT ACTG.ATAC.AA AA_ATCCTCAA C.AA-E-ACACTA -987 4 
GCA.AACC.AAA TT/^-^-ACAACA CCTTCG.='_zi_AG ATCATTC.ATT GTG.ATC.AAGT GGG.ATTTATT CC.AGGGATGG -9804 
AAGGATGGTT CA.ACATATGC A.^_ATC.AATCA ATGTG.ATACA TCATCCCAAC AA.A.ATGAAGT ACi^J-A-AACTA -97 3 4 
TATGATTATT TC^rTTTATG CAG.AA-A.A.AGC ATTTG.ATA.AA ATTCTGCACC CTTCATGATA A.A-iJ\CCCTCA -9664 
AAAjCaCCAGG TATACA.AG.AA ACATACAGGC C.AGGC.ACAGT GGCTC.AC.ACC TGCGATCCCA GCACTCTGGG -9594 
AGGCC.AAGGT GGG.ATG.ATTG CTTGGGCCCA GG.AGTTTG.AG ACTAGCCTGG GC.AAC-A.AA.AT G.AG.a.CCTGGT -9524 
CTACA-A-AAAA CTTTTTTAA-A AA-ATTAGCCA GGCATG.ATGG CATATGCCTG TAGTCCCAGC TAGTCTGGAG -9454 
GCTG^GGTCG GAGA-ATCACT TAAGCCTAGG AGGTCGAGGC TGC.AGTG.AGC CATG.^i-ACATG TCACTGTACT -93 8 4 
CCAGCCTAGA CA-ACAG.AACA AGACCCCACT G.A.ATA-AGA.AG A.AGG.AGAAGG AG.a.AGGGAGA AGGG.AGGGAG -9314 
AAGGG.AGGAG GAGG^GA-AGG AGG.AGGTGGA GGAGA.AGTGG A.AGGGGA.AGG GGA.AGGGA-AA GAGG-A.AGA.AG -9244 
A-AGA-^-ACATA t-tTCA-ACATA ATA-AAAGCCC TAT.ATG.AC.AG ACCGAGGTAG TATTATGAGG A-A-A^^-ACTGAA -917 4 
AGCCTTTCCT CTAAGATCTG GAA.AATGACA AGGGCCCACT TTCACCACTG TGATTCAJ\CA TAGTACTAGA -9104 
AGTCCTAGCT AGAGCA-ATCA GATA-AG.AGAA AG.a_A.ATAA-AA GGCATCCA.AA CTGGAA.AGGA AG-^^AGTCA-AA -9034 
TTATCCTGTT TGCAGATGAT ATG.ATCTTAT ATCTGGA.A-a_A GACTT?_AG.AC ACCACTA.z^A.A AACTATTAGA -8964 
GCTGAAATTT GGTACAGCAG G.ATACA.A.^-AT CAATGTACAA A-A-ATCAGTAG TATTTCTATA TTCC.AACAGC -88 9 4 
A-A.ACA.ATCTG A.=>A.AAGAAAC CA.AAA.A.AGCA GCTAC?AATA A-A.ATTA.AACA GCTAGGAATT AACChP'JKGJ^J^ -88 24 
GTGA.A.AGATC TCTACAATGA AAACTATA.AA ATGTTG.ATAA A.AG.A.AATTGA AGAGGGCACA i\APA?J\GAP-A -87o4 
AG.ATATTCCA TGTTCATAGA TTGGA-AGA-AT A-A.ATACTGTT A.A.A-ATGTCCA TACTACCC.AA AGC.AATTTAC -8684 
AA.ATTC2i-?TG C^^ATCCCT.AT TA-A.AATACTA ATGACGTTCT TCAC.AG-^A-AT AG.A-AG.AAACA ATTCTAAG.AT -8614 
TTGTACAGAA CCACA.A-S-AGA CCC-AG.A.ATAG CCA-A.AGCTAT CCTGACCAAA A.AG.A.ACA.AA.A CTGG.i'.AGCAT -8d4 4 
CACATTACCT GACTTCA.A.'^T TATACTACA-A AGCTATAGTA ACCC.A.AACTA C.ATGGTACTG GCATP.AA.AAC -8 47 4 
AGATGAGACA TGG.ACCAGAG G.A.AC.AGA-ATA G.z\G.A-ATCCAG A.A-ACAAATCC ATGCATCTAC AGTG.AACTCA -84 04 
TTTTTG.ACAA AGGTGCCAAG AACATACTTT GGGGA.A.A.AGA TA.ATCTCTTC A-ATAAATGGT GCTGG.AGGAJ\ -833 4 
CTGG^TATCC ATATGCA.aAA TAACA-ATACT AG.^LACTCTGT CTCTCACCAT ATACAAAAGC AA.ATCA.A-A.AT -8264 
GGATGAA.AGG CTT.n.AATCTA A_^ACCTCA-^_A CTTTGCAACT ACTA.A.A.AGA-A A.ACACCGG.AG ?.A.ACTCTCCA -8194 
GGACATTGGA GTGGGCAA.AG ACTTCTTGAG TA-ATTCCCTG CAGGCACAGG CA-ACCAAAGC AAA.AACAGAC -812 4 
A.AATGGGATC ATATCAAGTT A.AAA-AGCTTC TGCCCAGCAA AGG.A.A.ACAAT CA-ACAA-AGAG AAG.^^.GACAAC -8054 
CCACAGAATG GGAGAATATA TTTGCAAACT ATTCATCTAA CAAGGAATTA ATAACCAGTA TATATAAGGA -/984 
GCTCA.AACTA CTCTATAAGA AAAACACCTA ATAAGCTGAT TTTCAAAAAT AAGCA-A.AAGA TCTGGGTAGA -7 914 
CATTTCTCAA A.ATAAGTCAT ACAAATGGCA AACAGGCATC TGAAA.ATGTG CTCAACACCA CTG.ATCATCA -7 84 4 
GAGA.A.ATGCA AATCAAAACT ACTATGAGAG ATCATCTCAT CCCAGTTAAA ATGGCTTTTA TTC.^AAAGAC -7774 
AGGCA-ATAAC AA.ATGCCAGT GAGGATGTGG ATA.AAAGGAA ACCCTTGG.AC ACTGTTGGTG GGAATGGAAA -7 70 4 
TTGCTACCAC TATGGAGA.AC AGTTTG.AAAG TTCCTCAAAA A.ACTAAAAAT AAAGCTACCA TAC-AGCA.ATC -7 53 4 
CCATTGCTAG GTATATACTC CAAA.^.A.AGGG A.ATCAGTGTA TCAACAAGCT ATCTCCACTC CCACATTTAC -7564 
TGCAGC?CTG TTCATAGCAG CCAAGGTTTG G.a.AGCAACCT CAGTGTCCAT CA.AC.AGACGA ATGG-A.AA.AAG -749, 
AA.A.ATGTGGT GCAC.ATACAC AATGG.AGTAC TACGCAGCCA TAAA-AAAGAA TGAGATCCTG TCAGTTGCAA -7424 
CAGCATGGGG GGC.ACTGGTC AGTATGTTA-A GTG-A-A.ATA-AG CCAGGCACAG AA.AGACAA.AC TTTTCATGTT -7 354 
CTCCCTTACT TGTGGGAGCA AAAATTA-AAA CA.ATTGACAT AGAAATAG.AG GAGAATGGTG GTTCTAGAGG -7 28 4 
GGTGGGGGAC AGGGTGACTA GAGTCAACAA TA.ATTTATTG TATGTTTTAA AATAACTAAA AG.AGTATAAT -7214 
TGGGTTGTTT GTAACACAAA GAAAGG-ATAA ATGCTTGAAG GTGACAGATA CCCCATTTAC CCTGATGTGA -714 4 
TTATTACACA TTGTATGCCT GTATCAAA.AT ATCTCATGTA TGCTATAGAT ATAAACCCTA CTATATTAAA -707 4 
AATTA.AA.ATT TTAATGGCCA GGCACGGTGG CTCATGTCCG TAATCCCAGC ACTTTGGGAG GCCGAGGCGG 
GTGGATCACC TGAGGTCAGG AGTTTG.AAAG CAGTCTGGCC ACCATGATGA AACCCTGTCT CTACTAAAGA 
TACAA-AAATT AGCCAGGCGT GGTGGCACAT ACCTGTAGTC CCAACTACTC AGGAGGCTGA GACAGGAGAA -68o.^ 
TTGCTTGAAC CTGGGAGGCG GAGGTTGCAG TG.AGCCGAGA TCATGCCACT GCACTGCAGC CTGGGTGACA -6794 
GAGCA.AGACT CC.ATCTCA.AA ACAAA-A.ACAA A-A.A-AAAGAAG ATTAAAATTG TAATTTTTAT GTACCGTATA -6724 
AATATATACT CTACTATATT AGAAGTTA-AA A.ATTAAAACA ATTATAAAAG GTAATXAACC ACTTAATCTA -6654 
AA.ATA.AGAAC A.ATGTATGTG GGGTTTCTAG CTTCTGAAGA AGTA.AA.AGTT ATGGCCACGA TGGCAGAAAT -6584 
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GTGAGGAGGG ?J\CAGTGGAA GTTACTGTTG TTAGACGCTC ATACTCTCTG TAAGTGACTT AATTTTAACC -6514 
AAAGACAGGC TGGGAGAAGT TAAAGAGGCA TTCTATAAGC CCTAAAACAA CTGCTAATAA TGGTGAAAGG -6444 
TAATCTCTAT TAATTACCAA TAATTACAGA TATCTCTAAA ATCGAGCTGC AGAATTGGCA CGTCTGATCA -637 4 
CACCGTCCTC TCATTCACGG TGCTTTTTTT CTTGTGTGCT TGGAGATTTT CGATTGTGTG TTCGTGTTTG -6304 
GTTAAACTTA ATCTGTATGA ATCCTGAAAC GAAAAATGGT GGTGATTTCC TCCA.GAAGAA TTAGAGTACC -6234 
TGGCAGGAAG CAGGTGGCTC TGTGGACCTG AGCCACTTCA ATCTTCAAGG GTCTCTGGCC AAGACCCAGG -6154 
TGC^AGGCAG AGGCCTGATG ACCCGAGGAC AGGAJUIGCTC GGATGGGAJVG GGGCGATGAG AAGCCTGCCT -6094 
CG^TGGTGAG CAGCGCATGA AGTGCCCTTA TTTACGCTTT GCA-^AGATTG CTCTGGATAC CATCTGGAAA -602 4 
AGGCGGCCAG CGGGAATGCA AGGAGTCAGA AGCCTCCTGC TCAAACCCAG GCCAGCAGCT ATGGCGCCCA -5954 
CCCGGGCGTG TGCCAGAGGG AGAGGAGTCA AGGCACCTCG ?.AGTATGGCT TAAATCTTTT TTTCACCTGA -588 4 
AGCAGTGACC AAGGTGTATT CTGAGGGA.AG CTTGAGTTAG GTGCCTTCTT TAAAACAGAA AGTCATGGAA -5814 
GCACCCTTCT CAAGGGAAAA CCAGACGCCC GCTCTGCGGT CATTTACCTC TTTCCTCTCT CCCTCTCTTG -57 4 4 
CCCTCGCGGT TTCTGATCGG GACAGAGTGA CCCCCGTGGA GCTTCTCCGA GCCCGTGCTG AGGACCCTCT -5 67 4 
TGCAAAGGGC TCCACAGACC CCCGCCCTGG AGAGAGGAGT CTGAGCCTGG CTTAATAACA AACTGGGATG -5 60 4 
TGGCTGGGGG CGGACAGCGA CGGCGGGATT CAA_AGACTTA ATTCCATGAG TAAATTCAAC CTTTCCACAT -553 4 
CCGAATGGAT TTGGATTTTA TCTTA-ATATT TTCTTAAATT TCATCAAATA ACATTCAGGA CTGCAGAAAT -54 64 
CCa'^AGGCGT AAA_ACAGGAA CTGAGCTATG TTTGCCAAGG TCCAAGGACT TAATAJ^CCAT GTTCAGAGGG -53 94 
ATTTTTCGCC CTAAGTACTT TTTATTGGTT TTCATAAGGT GGCTTAGGGT GCAAGGGAAA GTACACGAGG -5324 
AGAGGCCTGG GCGGCAGGGC TATGAGCACG GCAGGGCCAC CGGGGAGAGA GTCCCCGGCC TGGGAGGCTG -5254 
ACAGCAGGAC CACTGACCGT CCTCCCTGGG AGCTGCCACA TTGGGCAACG CGAA.GGCGGC CACGCTGCGT -518 4 
GTGACTCAGG ACCCCATACC GGCTTCCTGG GCCCACCCAC ACTAACCCAG G.A.AGTCACGG AGCTCTGAAC -5114 
CCGTGGAAAC G.A.ACATGACC CTTGCCTGCC TGCTTCCCTG GGTGGGTCAA GGGTAATGA.A GTGGTGTGCA -50 4 4 
GG^^ATGGCC ATGTAAATTA CACGACTCTG CTGATGGGGA CCGTTCCTTC CATCATTATT CATCTTCACC -4974 
CCcXaGGACT GAATGATTCC AGCAACTTCT TCGGGTGTGA CA.AGCCATGA CAAAACTCAG TACB-A.ACACC -4 904 
ACTCTTTTAC TAGGCCCACA GAGCACGGSC CACACCCCTG ATATATTAJi.G AGTCCAGGAG AGATGAGGCT -48 34 
GCTTTCAGCC ACCAGGCTGG GGTGACAACA GCGGCTG.AA.C AGTCTGTTCC TCTAGACTAG TAGACCCTGG -47 64 
CAGGCACTCC CCCAGATTCT AGGGCCTGGT TGCTGCTTCC CGAGGGCGCC ATCTGCCCTG GAGACTCAGC -4 694 
CTGGGGTGCC ACACTGAGGC CAGCCCTGTC TCCACACCCT CCGCCTCCAG GCCTCAGCTT CTCCAGCAGC -4624 
TTCCTAAACC CTGGGTGGGC CGTGTTCCAG CGCTACTGTC TCACCTGTCC CACTGTGTCT TGTCTCAGCG -4554 
ACGTAGCTCG CACGGTTCCT CCTCACATGG GGTGTCTGTC TCCTTCCCCA ACACTCACAT GCGTTGAAGG -44 84 
GAGGAGATTC TGCGCCTCCC AGACTGGCTC CTCTGAGCCT GAACCTGGCT CGTGGCCCCC GATGCAGGTT -4 414 
CCVGGCGTCC GGCTGCACGC TGACCTCCAT TTCCAGGCGC TCCCCGTCTC CTGTCATCTG CCGGGGCCTG -434 4 
CCGGTGTGTT CTTCTGTTTC TGTGCTCCTT TCCACGTCCA GCTGCGTGTG TCTCTGCCCG CTAGGGTCTC -4274 
GGGGTTTTTA TAGGCATAGG ACGGGGGCGT GGTGGGCCAG GGCGCTCTTG GGA.^-ATGCAA CATTTGGGTG -4204 
TGA.!VAGTAGG AGTGCCTGTC CTCACCTAGG TCCACGGGCA CAGGCCTGGG GATGGAGCCC CCGCCAGGGA -4134 
CCCGCCCTTC TCTGCCCAGC ACTTTCCTGC CCCCCTCCCT CTGGAACACA GAGTGGCAGT TTCCACAAGC -4 0 64 
ACTAAGCATC CTCTTCCCAA A_AGACCCAGC ATTGGCACCC CTGGACATTT GCCCCACAGC CCTGGGAATT -3 994 

IC AC GT GK cfA CGCACATCAT GTACACACTC CCGTCCACGA CCGACCCCCG CTGTTTTATT TTA.ATAGCTA -3 924 
CAj^^GCA GGG A.ajVTCCCTGC TAA.AATGTCC TTT/^ACPAAC TGGTTAAACA A.ACGGGTCCA TCCGCACGGT -3854 
GGACAGTTCC TCACAGTGAA GAGGAACATG CCGTTTATAA AGCCTGCAGG CATCTCAAGG GPATTACGCT -3784 
GAGTCAAAAC TGCCACCTCC ATGGGATACG TACGCAACAT GCTCAAA-AAG AAAGAATTTC ACCCCATGGC -3714 
AGGGGAGTGG TTAGGGGGGT TAAGGACGGT GGGGGCGGCA GCTGGGGGCT ACTGCACGCA CCTTTTACTA -3 64 4 
AAGCCAGTTT CCTGGTTCTG ATGGTATTGG CTCAGTTATG GGAGACTAAC CATAGGGGAG TGGGGATGGG -3574 
GG.^-ACCCGGA GGCTGTGCCA TCTTTGCCAT GCCCGAGTGT CCTGGGCAGG ATA-ATGCTCT AGAGATGCCC -350 4 
ACGTCCTGAT TCCCCCAA.AC CTGTGGACAG A.ACCCGCCCG GCCCCAGGGC CTTTGCAGGT GTGATCTCCG -3 43 4 
TGAGGACCCT GAGGTCTGGG ATCCTTCGGG ACTACCTGCA GGCCCGAAAA GTAATCCAGG GGTTCTGGGA -3364 
AGAGGCGGGC AGGAGGGTCA GAGGGGGGCA GCCTCAGGAC GATGGAGGCA GTCAGTCTGA GGCTGA.AAAG -3294 
GGAGGGAGGG CCTCGAGCCC AGGCCTGCAA GCGCCTCCAG AAGCTGG.AAA AAGCGGGGAA GGGACCCTCC -3224 
ACGGAGCCTG CAGCAGGA-AG GCACGGCTGG CCCTTAGCCC ACCAGGGCCC ATCGTGGACC TCCGGCCTCC -3154 
GTGCCATAGG AGGGCACTCG CGCTGCCCTT CTAGCATGAA GTGTGTGGGG ATTTGCAGAA GC.^^-ACAGGAA -308 4 
ACCCATGCAC TGTGAATCTA GGATTATTTC AAAACAAAGG TTTACAGAAA CATCCAAGGA CAGGGCTGAA -3014 
GTGCCTCCGG GCAAGGGCAG GGCAGGCACG AGTGATTTTA TTTAGCTATT TTATTTTATT ^ACTTACTTT 
CTGAGACAGA GTTATGCTCT TGTTGCCCAG GCTGGAGTGC AGCGGCATGA TCTTGGCTCA CTGCAACCTC 
CGTCTCCTGG GTTCAAGCAA TTCTCGTGCC TCAGCCTCCC AAGTAGCTGG GATTTCAGGC GTGCACCACC 
ACACCCGGCT AATTTTGTAT TTTTAGTAGA GATGGGCTTT CACCATGTTG GTCAAGCTGA TCTCAAAATC 
CTGACCTCAG GTGATCCGCC CACCTCAGCC TCCCAAAGTG CTGGGATTAC AGGCATGAGC CACTGCACCT 
GGCCTATTTA ACCATTTTAA AACTTCCCTG GGCTCAAGTC ACACCCACTG GTAAGGAGTT CATGGAGTTC 
AATTTCCCCT TTACTCAGGA GTTACCCTCC TTTGATATTT TCTGTAATTC TTCGTAGACT GGGGATACAC 
CGTCTCTTGA CATATTCACA GTTTCTGTGA CCACCTGTTA TCCCATGGGA CCCACTGCAG GGGCAGCTGG 
GAGGCTGCAG GCTTCAGGTC CCAGTGGGGT TGCCATCTGC CAGTAGAAAC CTGATGTAGA ATCAGGGCGC 
AAGTGTGGAC ACTGTCCTGA ATCTCAATGT CTCAGTGTGT GCTGAAACAT GTAGAAATTA AAGTCCATCC 
CTCCTACTCT ACTGGGATTG AGCCCCTTCC CTATCCCCCC CCAGGGGCAG AGGAGTTCCT CTCACTCCTG 
TGGAGGAAGG AATGATACTT TGTTATTTTT CACTGCTGGT ACTGAATCCA CTGTTTCATT TGTTGGTTTG 
TTTGTTTTGT TTTGAGAGGC GGTTTCACTC TTGTTGCTCA GGCTGGAGGG AGTGCAATGG CGCGATCTTG 
GCTTACTGCA GCCTCTGCCT CCCAGGTTCA AGTGATTCTC CTGCTTCCGC CTCCCATTTG GCTGGGATTA 
CAGGCACCCG CCACCATGCC CAGCTAATTT TTTGTATTTT TAGTAGAGAC GGGGGTGGGT GGGGTTCACC 
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ATGTTGGCCA GGCTGGTCTC GAACTTCTGA CCTCAGATGA TCCACCTGCC -TCTGCCTCCT A6A.GTGCTGG -18 94 

GATTACAGGT GTGAGCCACC ATGCCCAGCT CAGAATTTAC TCTGTTTAGA AACATCTGGG TCTGAGGTAG -1824 

CAAT-Box 

GAAGCTCACC CCACTCAAGT GTTGTGGTGT TTTAj ^GCCAA~T| 3ATAGAATT TTTTTATTGT TGTTAGAACA -1754 

CTCTTGATGT TTTACACTGT GATGACTAAG ACATCATCAG CTTTTCAAAG ACACACTAAC TGCACCCATA -168 4 

ATACTGGGGT GTCTTCTGGG TATCAGCAAT CTTCATTGAA TGCCGGGAGG CGTTTCCTCG CCATGCACAT -1614 

GGTGTTAATT ACTCCAGCAT AATCTTCTGC TTCCATTTCT TCTCTTCCCT CTTTTA-AAAT TGTGTTTTCT -154 4 

ATGTTGGCTT CTCTGCAGAG AJ\CCAGTGTA AGCTACA.LCT TAACTTTTGT TGGAACAAAT TT7CCAAl{cc] -147 4 
Spl 

[gCC^ CTTTGC CCTAGTGGCA GAGACA-ATTC ACPAACACAG CCCTTTAAAA AGGCTTAGGG ATCACTAAGG -1404 

GGATTTCTAG AAGAGCGACC TGTAA.TCCTA AGTATTTACA AGACGAGGCT AACCTCCAGC GAGCGTGACA -1334 

GCCCAGGGAG GGTGCGAGGC CTGTTCAAAT GCTAGCTCCA TAAATAAAGC AATTTCCTCC GGCAGTTTCT -12 64 

G?-^.^GTAGGA AAGGTTACAT TT?AGGTTGC GTTTGTTAGC ATTTCAGTGT TTGCCC-ACCT CAGCTACAGC -119 4 

ATCCCTGCA.A GGCCTCGGGA GACCCAGA-AG TTTCTCGCCC CCTTAGATCC A?ACTTGAGC P_-.CCCGGAGT' -1124 

CTGGATTCCT GGGAAGTCCT CAGCTGTCCT GCGGTTGTGC CGGGGCCCCA GGTCTGGAGG GGACCAGTGG -1054 

CCGTGTGGCT TCTACTGCTG GGCTGGAAGT CGGGCCTCCT AGCTCTGCAG TCCGAGGCTT GG.-.GCCAGGT -98 4 

GCCTGGACCC CGAGGCTGCC CTCCACCCTG TGCGGGCGGG ATGTGACCAG ATGTTGGCCT CATCTGCCAG -914 

ACAGAGTGCC GGGGCCCAGG GTCA.=.GGCCG TTGTGGCTGG TGTGAGGCGC CCGGTGCGCG GCCAGCAGGA -84 4 

CCAC-Box Spl 
GCGCCTGGCT CCATTT [cCCA~CCC| rTTCTCG ACGGGAc jcGC CCCj SGTGGGT GATTAACAGA TTTGGGGTGG -77 4 

TTTGCTCATG GTGGGGACCC CTCGCCGCCT GAG.A-ACCTGC AA.AGAGAAAT GACGGGCCTG TGTCAAGGAG -704 

CCCA.^GTCGC GGGGAAGTGT TGCAGGGAGG CACTCCGGGA GGTCCCGCGT GCCCGTCCAG GG.-.GCAATGC -634 
AP-2 

GTCCTCGGGT TCC ^CCCCAG CCGj CGTCTAC GCGCCTCCGT CCTCCCCTTC ACGTCCGGCA TiCC-TGGTGC -564 

CCGGAGCCCG ACGCCCCGCG TCCGGACCTG GAGGCAGCCC TGGGTCTCCG GATCAGGCCA GCGGCCAAAG -4 94 

GGTCGCCGCA CGCACCTGTT CCCAGGGCCT CCACATCATG GCCCCTCCCT CGGGTTACCC CACAGCCTAG -4 24 

GCCGATTCGA CCTCTCTCCG CTGGGGCCCT CGCTGGCGTC CCTGCACCCT GGGAGCGCGA GCGGC |3CGCG| -354 
Spl 

[^^-GGAAG CGCGGCCCAG ACCCCCGGGT |CCGCCC| 3GAG CAGCTGCGCT GTCGGGGCCA GGCCGGGCTC -284 
I ^ c-Mvc 

CCAGTGGATT CGCGGGCACA GACGCCCAGG ACCGCGCTCC c jCACGTG^ CG^GAGGGACTGG GGACCCGGGC -214 

ACCCGTCCTG CCCCTTCACC TTCCAGCTCC GCCTCCTCCG CGCGGACC jc^GCCc) cGTCCC G?^CCCCTCCC -14 4 

GGGTCCCCGG CCCAGCCCCC TCCGGGCCCT CCCAGCCC CT CCCCTTCCTT TCCGCGGCCjc CGCCC| rCTCC -7 4 

c-Mvc 

TCGCGGCGCG AGTT TCAGGC AGCGCTGCGT CCTGCTGCC ^ACGTG^ G.aAG CCCTGGCCCC GGCCACCCCC -4 
GCGATG ^ 
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